XFaiss

Faiss is Meta’s open-source library for efficient similarity search and clustering of dense vectors, widely used for large-scale ANN (approximate nearest neighbor) search.

XFaiss adds MU device acceleration on top of Faiss. Based on Faiss 1.13.0, it works as a drop-in replacement — accelerate search on MX1 with minimal code changes and existing .faiss index files. Internally, XFaiss uses libxvector-dev (the XVector API Reference) for device communication and kernel execution.

Supported Index Types

Index	Base Class	Description
MuIndexIvfFlat	`faiss::IndexIVFFlat`	IVF-Flat with device-accelerated fine search
MuIndexIvfRabitq	`faiss::IndexIVFRaBitQ`	IVF-RaBitQ with device-accelerated fine search
MuIndexFlat	`faiss::IndexFlat`	Brute-force exact KNN with device-accelerated search

How It Works

Faiss supports GPU offloading — wrapping a CPU index with a GPU index class to accelerate search on NVIDIA GPUs. XFaiss applies the same pattern for MU (MX1): wrap a CPU index with an MU index class to offload the compute-intensive fine search to the MU device.

Faiss GPU	XFaiss MU	Role
`GpuResources`	`MuResources`	Device resource management
`GpuIndexIVFFlat`	`MuIndexIvfFlat`	IVF-Flat index with device acceleration
`index_cpu_to_gpu()`	Constructor + `syncToMuDevice()`	Transfer index data to device

Setup

Before search, the application initializes MU resources and transfers index data to the device:

makeMuResources() — connect to the MU device and load XVector kernels
Construct an MU index (e.g., MuIndexIvfFlat) from an existing Faiss CPU index
syncToMuDevice() — copy index data from CPU memory to device memory

Future Work

In the future, indexes may be created directly in CXL memory, significantly reducing CPU-to-device copy overhead and making this step optional.

Once setup is complete, the MU index is ready to handle search requests.

Search

Each search executes in two phases:

Coarse search (CPU) — the Faiss quantizer selects the top-nprobe candidate clusters
Fine search (MU device) — XVector kernels compute distances for all vectors within the selected clusters and return the top-k results

Next Steps

Quick Start — build, prepare data, and run benchmarks