Quick Start

XFaiss provides a ./xfaiss.sh script for building, data preparation, and benchmarking with predefined configurations:

Step	Command	Description
Build	`./xfaiss.sh build`	Build XFaiss library including tutorials and benchmarks
Prepare Data	`./xfaiss.sh prepare-data-tui`	Prepare dataset and index files (TUI)
	`./xfaiss.sh prepare-data [options]`	Prepare dataset and index files (CLI)
Benchmark	`./xfaiss.sh bench-tui`	Run search benchmark (TUI)
	`./xfaiss.sh bench <preset> [options]`	Run search benchmark (CLI, see Benchmark Reference)

Prepare Data

prepare-data downloads public benchmark datasets, converts formats, and builds Faiss indexes. prepare-data-tui provides the same functionality with an interactive TUI.

Datasets

Dataset	Dimension	Vectors	Metric	Download Size
`sift1m`	128	1M	L2	~500MB
`gist1m`	960	1M	L2	~4GB
`sift100m`	128	100M	L2	~49GB
`deep100m`	96	100M	L2	~358GB
`7m-d1024-l2`	1024	7M	L2	generated
`7m-d1024-ip`	1024	7M	IP	generated

Pipeline

Download — fetch datasets from public sources (IRISA FTP, Yandex Cloud)
Convert — format conversion (bvecs/fbin → fvecs)
Build — create IVF-Flat and IVF-RaBitQ indexes (nlist=4096)
Cleanup — remove base vectors to save space (unless --keep-raw)

Output:

{data_dir}/benchmark_indexes/{dataset}/
  ├── {dataset}.ivf_flat_nlist4096.faiss
  ├── {dataset}.ivf_rabitq_nlist4096.faiss
  ├── query.fvecs
  └── groundtruth.ivecs

CLI Options

./xfaiss.sh prepare-data --data-dir <dir> [options...]

Option	Description	Default
`--data-dir <dir>`	Root data directory	(required)
`--datasets <list>`	Comma-separated dataset names	all
`--skip-download`	Skip download step (use existing raw files)	off
`--skip-build`	Skip index build (download/convert only)	off
`--keep-raw`	Keep raw downloaded files after building	off
`--nlist <n>`	Number of IVF clusters	`4096`

TUI

prepare-data-tui opens a TUI to select datasets for download and index generation:

$ ./xfaiss.sh prepare-data-tui
Select datasets (space to toggle, enter to confirm):
  ✓ sift1m        SIFT 1M (128-dim, ~500MB download)
  • gist1m        GIST 1M (960-dim, ~4GB download)
  • sift100m      SIFT 100M / BigANN (128-dim, 1B raw ~49GB download)
  • deep100m      Deep 100M (96-dim, 1B raw ~358GB download)
  ✓ 7m-d1024-l2   Synthetic 7M (1024-dim, L2, generated)
> ✓ 7m-d1024-ip   Synthetic 7M (1024-dim, IP, generated)

Benchmark

bench-tui opens a TUI to select a benchmark preset and configure parameters:

1. Select benchmark preset:

$ ./xfaiss.sh bench-tui
Select benchmark preset:
> flat-sift1m                       IVF-Flat on SIFT-1M
  flat-gist1m                       IVF-Flat on GIST-1M
  flat-sift100m                     IVF-Flat on SIFT-100M
  flat-deep100m                     IVF-Flat on Deep-100M
  flat-7m-d1024-l2-query-fixed      IVF-Flat on 7M-d1024 (L2, Query-Fixed)
  flat-7m-d1024-l2-cluster-fixed    IVF-Flat on 7M-d1024 (L2, Cluster-Fixed) [experimental]
  flat-7m-d1024-ip-query-fixed      IVF-Flat on 7M-d1024 (IP, Query-Fixed)
  flat-7m-d1024-ip-cluster-fixed    IVF-Flat on 7M-d1024 (IP, Cluster-Fixed) [experimental]
  rabitq-sift1m                     IVF-RaBitQ on SIFT-1M
  rabitq-gist1m                     IVF-RaBitQ on GIST-1M
  rabitq-sift100m                   IVF-RaBitQ on SIFT-100M
  rabitq-deep100m                   IVF-RaBitQ on Deep-100M

2. Review configurations and run:

  Preset: flat-sift1m
> ── Data ────────────────────────
    DATA_DIR           /var/opt/xvector-data
  ── Search ──────────────────────
    NPROBE             128
    TOPK               100
    SKIP_CPU           no
    MAX_QUERIES        (all)
    RANDOM_QUERIES     (preset default)
    SEARCH_ITERATIONS_ON_MU (1)
    SEARCH_ITERATIONS_ON_CPU (1)
  ── Search (IVF-Flat) ───────────
    FLAT_PARALLEL_MODE 1 (QUERY_FIXED)
  ── Device ──────────────────────
    DEVICEID           0
    NUMSUB             24
    TASKCOUNT          192
    LOCALITY_MODE      1 (SPREAD)
  ── CPU / Host ──────────────────
    NUMA_CPUBIND           0
    NUMA_MEMBIND           0
    OMP_NUM_THREADS        64
    OPENBLAS_NUM_THREADS   1
    GDB                    no
  ────────────────────────────────
    >>> Run benchmark
    <<< Change preset

Next Steps

Tutorial — MU-accelerated index usage with code examples
Benchmark Reference — Preset and option reference for bench