XFaiss
Faiss is Meta’s open-source library for efficient similarity search and clustering of dense vectors, widely used for large-scale ANN (approximate nearest neighbor) search.
XFaiss adds MU device acceleration on top of Faiss. Based on Faiss 1.13.0, it works as a drop-in replacement — accelerate search on MX1 with minimal code changes and existing .faiss index files. Internally, XFaiss uses libxvector-dev (the XVector API Reference) for device communication and kernel execution.
Supported Index Types
| Index | Base Class | Description |
|---|---|---|
| MuIndexIvfFlat | faiss::IndexIVFFlat | IVF-Flat with device-accelerated fine search |
| MuIndexIvfRabitq | faiss::IndexIVFRaBitQ | IVF-RaBitQ with device-accelerated fine search |
| MuIndexFlat | faiss::IndexFlat | Brute-force exact KNN with device-accelerated search |
How It Works
Faiss supports GPU offloading — wrapping a CPU index with a GPU index class to accelerate search on NVIDIA GPUs. XFaiss applies the same pattern for MU (MX1): wrap a CPU index with an MU index class to offload the compute-intensive fine search to the MU device.
| Faiss GPU | XFaiss MU | Role |
|---|---|---|
GpuResources | MuResources | Device resource management |
GpuIndexIVFFlat | MuIndexIvfFlat | IVF-Flat index with device acceleration |
index_cpu_to_gpu() | Constructor + syncToMuDevice() | Transfer index data to device |
Setup
Before search, the application initializes MU resources and transfers index data to the device:
makeMuResources()— connect to the MU device and load XVector kernels- Construct an MU index (e.g.,
MuIndexIvfFlat) from an existing Faiss CPU index syncToMuDevice()— copy index data from CPU memory to device memory
Future Work
In the future, indexes may be created directly in CXL memory, significantly reducing CPU-to-device copy overhead and making this step optional.
Once setup is complete, the MU index is ready to handle search requests.
Search
Each search executes in two phases:
- Coarse search (CPU) — the Faiss quantizer selects the top-nprobe candidate clusters
- Fine search (MU device) — XVector kernels compute distances for all vectors within the selected clusters and return the top-k results
Next Steps
- Quick Start — build, prepare data, and run benchmarks