XFaiss

Faiss is Meta’s open-source library for efficient similarity search and clustering of dense vectors, widely used for large-scale ANN (approximate nearest neighbor) search.

XFaiss adds MU device acceleration on top of Faiss. Based on Faiss 1.13.0, it works as a drop-in replacement — accelerate search on MX1 with minimal code changes and existing .faiss index files. Internally, XFaiss uses libxvector-dev (the XVector API Reference) for device communication and kernel execution.

Supported Index Types

Index Base Class Description
MuIndexIvfFlat faiss::IndexIVFFlat IVF-Flat with device-accelerated fine search
MuIndexIvfRabitq faiss::IndexIVFRaBitQ IVF-RaBitQ with device-accelerated fine search
MuIndexFlat faiss::IndexFlat Brute-force exact KNN with device-accelerated search

How It Works

Faiss supports GPU offloading — wrapping a CPU index with a GPU index class to accelerate search on NVIDIA GPUs. XFaiss applies the same pattern for MU (MX1): wrap a CPU index with an MU index class to offload the compute-intensive fine search to the MU device.

Faiss GPU XFaiss MU Role
GpuResources MuResources Device resource management
GpuIndexIVFFlat MuIndexIvfFlat IVF-Flat index with device acceleration
index_cpu_to_gpu() Constructor + syncToMuDevice() Transfer index data to device

Setup

Before search, the application initializes MU resources and transfers index data to the device:

  1. makeMuResources() — connect to the MU device and load XVector kernels
  2. Construct an MU index (e.g., MuIndexIvfFlat) from an existing Faiss CPU index
  3. syncToMuDevice() — copy index data from CPU memory to device memory

Future Work

In the future, indexes may be created directly in CXL memory, significantly reducing CPU-to-device copy overhead and making this step optional.

Once setup is complete, the MU index is ready to handle search requests.

Each search executes in two phases:

  1. Coarse search (CPU) — the Faiss quantizer selects the top-nprobe candidate clusters
  2. Fine search (MU device) — XVector kernels compute distances for all vectors within the selected clusters and return the top-k results

Next Steps

  • Quick Start — build, prepare data, and run benchmarks