Quick Start
XFaiss provides a ./xfaiss.sh script for building, data preparation, and benchmarking with predefined configurations:
| Step | Command | Description |
|---|---|---|
| Build | ./xfaiss.sh build | Build XFaiss library including tutorials and benchmarks |
| Prepare Data | ./xfaiss.sh prepare-data-tui | Prepare dataset and index files (TUI) |
./xfaiss.sh prepare-data [options] | Prepare dataset and index files (CLI) | |
| Benchmark | ./xfaiss.sh bench-tui | Run search benchmark (TUI) |
./xfaiss.sh bench <preset> [options] | Run search benchmark (CLI, see Benchmark Reference) |
Prepare Data
prepare-data downloads public benchmark datasets, converts formats, and builds Faiss indexes. prepare-data-tui provides the same functionality with an interactive TUI.
Datasets
| Dataset | Dimension | Vectors | Metric | Download Size |
|---|---|---|---|---|
sift1m | 128 | 1M | L2 | ~500MB |
gist1m | 960 | 1M | L2 | ~4GB |
sift100m | 128 | 100M | L2 | ~49GB |
deep100m | 96 | 100M | L2 | ~358GB |
7m-d1024-l2 | 1024 | 7M | L2 | generated |
7m-d1024-ip | 1024 | 7M | IP | generated |
Pipeline
- Download — fetch datasets from public sources (IRISA FTP, Yandex Cloud)
- Convert — format conversion (bvecs/fbin → fvecs)
- Build — create IVF-Flat and IVF-RaBitQ indexes (nlist=4096)
- Cleanup — remove base vectors to save space (unless
--keep-raw)
Output:
{data_dir}/benchmark_indexes/{dataset}/
├── {dataset}.ivf_flat_nlist4096.faiss
├── {dataset}.ivf_rabitq_nlist4096.faiss
├── query.fvecs
└── groundtruth.ivecs
CLI Options
./xfaiss.sh prepare-data --data-dir <dir> [options...]
| Option | Description | Default |
|---|---|---|
--data-dir <dir> | Root data directory | (required) |
--datasets <list> | Comma-separated dataset names | all |
--skip-download | Skip download step (use existing raw files) | off |
--skip-build | Skip index build (download/convert only) | off |
--keep-raw | Keep raw downloaded files after building | off |
--nlist <n> | Number of IVF clusters | 4096 |
TUI
prepare-data-tui opens a TUI to select datasets for download and index generation:
$ ./xfaiss.sh prepare-data-tui
Select datasets (space to toggle, enter to confirm):
✓ sift1m SIFT 1M (128-dim, ~500MB download)
• gist1m GIST 1M (960-dim, ~4GB download)
• sift100m SIFT 100M / BigANN (128-dim, 1B raw ~49GB download)
• deep100m Deep 100M (96-dim, 1B raw ~358GB download)
✓ 7m-d1024-l2 Synthetic 7M (1024-dim, L2, generated)
> ✓ 7m-d1024-ip Synthetic 7M (1024-dim, IP, generated)
Benchmark
bench-tui opens a TUI to select a benchmark preset and configure parameters:
1. Select benchmark preset:
$ ./xfaiss.sh bench-tui
Select benchmark preset:
> flat-sift1m IVF-Flat on SIFT-1M
flat-gist1m IVF-Flat on GIST-1M
flat-sift100m IVF-Flat on SIFT-100M
flat-deep100m IVF-Flat on Deep-100M
flat-7m-d1024-l2-query-fixed IVF-Flat on 7M-d1024 (L2, Query-Fixed)
flat-7m-d1024-l2-cluster-fixed IVF-Flat on 7M-d1024 (L2, Cluster-Fixed) [experimental]
flat-7m-d1024-ip-query-fixed IVF-Flat on 7M-d1024 (IP, Query-Fixed)
flat-7m-d1024-ip-cluster-fixed IVF-Flat on 7M-d1024 (IP, Cluster-Fixed) [experimental]
rabitq-sift1m IVF-RaBitQ on SIFT-1M
rabitq-gist1m IVF-RaBitQ on GIST-1M
rabitq-sift100m IVF-RaBitQ on SIFT-100M
rabitq-deep100m IVF-RaBitQ on Deep-100M
2. Review configurations and run:
Preset: flat-sift1m
> ── Data ────────────────────────
DATA_DIR /var/opt/xvector-data
── Search ──────────────────────
NPROBE 128
TOPK 100
SKIP_CPU no
MAX_QUERIES (all)
RANDOM_QUERIES (preset default)
SEARCH_ITERATIONS_ON_MU (1)
SEARCH_ITERATIONS_ON_CPU (1)
── Search (IVF-Flat) ───────────
FLAT_PARALLEL_MODE 1 (QUERY_FIXED)
── Device ──────────────────────
DEVICEID 0
NUMSUB 24
TASKCOUNT 192
LOCALITY_MODE 1 (SPREAD)
── CPU / Host ──────────────────
NUMA_CPUBIND 0
NUMA_MEMBIND 0
OMP_NUM_THREADS 64
OPENBLAS_NUM_THREADS 1
GDB no
────────────────────────────────
>>> Run benchmark
<<< Change preset
Next Steps
- Tutorial — MU-accelerated index usage with code examples
- Benchmark Reference — Preset and option reference for
bench