Quick Start

XFaiss provides a ./xfaiss.sh script for building, data preparation, and benchmarking with predefined configurations:

Step Command Description
Build ./xfaiss.sh build Build XFaiss library including tutorials and benchmarks
Prepare Data ./xfaiss.sh prepare-data-tui Prepare dataset and index files (TUI)
  ./xfaiss.sh prepare-data [options] Prepare dataset and index files (CLI)
Benchmark ./xfaiss.sh bench-tui Run search benchmark (TUI)
  ./xfaiss.sh bench <preset> [options] Run search benchmark (CLI, see Benchmark Reference)

Prepare Data

prepare-data downloads public benchmark datasets, converts formats, and builds Faiss indexes. prepare-data-tui provides the same functionality with an interactive TUI.

Datasets

Dataset Dimension Vectors Metric Download Size
sift1m 128 1M L2 ~500MB
gist1m 960 1M L2 ~4GB
sift100m 128 100M L2 ~49GB
deep100m 96 100M L2 ~358GB
7m-d1024-l2 1024 7M L2 generated
7m-d1024-ip 1024 7M IP generated

Pipeline

  1. Download — fetch datasets from public sources (IRISA FTP, Yandex Cloud)
  2. Convert — format conversion (bvecs/fbin → fvecs)
  3. Build — create IVF-Flat and IVF-RaBitQ indexes (nlist=4096)
  4. Cleanup — remove base vectors to save space (unless --keep-raw)

Output:

{data_dir}/benchmark_indexes/{dataset}/
  ├── {dataset}.ivf_flat_nlist4096.faiss
  ├── {dataset}.ivf_rabitq_nlist4096.faiss
  ├── query.fvecs
  └── groundtruth.ivecs

CLI Options

./xfaiss.sh prepare-data --data-dir <dir> [options...]
Option Description Default
--data-dir <dir> Root data directory (required)
--datasets <list> Comma-separated dataset names all
--skip-download Skip download step (use existing raw files) off
--skip-build Skip index build (download/convert only) off
--keep-raw Keep raw downloaded files after building off
--nlist <n> Number of IVF clusters 4096

TUI

prepare-data-tui opens a TUI to select datasets for download and index generation:

$ ./xfaiss.sh prepare-data-tui
Select datasets (space to toggle, enter to confirm):
  ✓ sift1m        SIFT 1M (128-dim, ~500MB download)
  • gist1m        GIST 1M (960-dim, ~4GB download)
  • sift100m      SIFT 100M / BigANN (128-dim, 1B raw ~49GB download)
  • deep100m      Deep 100M (96-dim, 1B raw ~358GB download)
  ✓ 7m-d1024-l2   Synthetic 7M (1024-dim, L2, generated)
> ✓ 7m-d1024-ip   Synthetic 7M (1024-dim, IP, generated)

Benchmark

bench-tui opens a TUI to select a benchmark preset and configure parameters:

1. Select benchmark preset:

$ ./xfaiss.sh bench-tui
Select benchmark preset:
> flat-sift1m                       IVF-Flat on SIFT-1M
  flat-gist1m                       IVF-Flat on GIST-1M
  flat-sift100m                     IVF-Flat on SIFT-100M
  flat-deep100m                     IVF-Flat on Deep-100M
  flat-7m-d1024-l2-query-fixed      IVF-Flat on 7M-d1024 (L2, Query-Fixed)
  flat-7m-d1024-l2-cluster-fixed    IVF-Flat on 7M-d1024 (L2, Cluster-Fixed) [experimental]
  flat-7m-d1024-ip-query-fixed      IVF-Flat on 7M-d1024 (IP, Query-Fixed)
  flat-7m-d1024-ip-cluster-fixed    IVF-Flat on 7M-d1024 (IP, Cluster-Fixed) [experimental]
  rabitq-sift1m                     IVF-RaBitQ on SIFT-1M
  rabitq-gist1m                     IVF-RaBitQ on GIST-1M
  rabitq-sift100m                   IVF-RaBitQ on SIFT-100M
  rabitq-deep100m                   IVF-RaBitQ on Deep-100M

2. Review configurations and run:

  Preset: flat-sift1m
> ── Data ────────────────────────
    DATA_DIR           /var/opt/xvector-data
  ── Search ──────────────────────
    NPROBE             128
    TOPK               100
    SKIP_CPU           no
    MAX_QUERIES        (all)
    RANDOM_QUERIES     (preset default)
    SEARCH_ITERATIONS_ON_MU (1)
    SEARCH_ITERATIONS_ON_CPU (1)
  ── Search (IVF-Flat) ───────────
    FLAT_PARALLEL_MODE 1 (QUERY_FIXED)
  ── Device ──────────────────────
    DEVICEID           0
    NUMSUB             24
    TASKCOUNT          192
    LOCALITY_MODE      1 (SPREAD)
  ── CPU / Host ──────────────────
    NUMA_CPUBIND           0
    NUMA_MEMBIND           0
    OMP_NUM_THREADS        64
    OPENBLAS_NUM_THREADS   1
    GDB                    no
  ────────────────────────────────
    >>> Run benchmark
    <<< Change preset

Next Steps