P2P KV Cache Sharing

This example demonstrates how to share KV cache across multiple vLLM instances using Maru as a shared storage backend.

Overview

When multiple vLLM instances serve the same or similar prompts, they redundantly compute and store the same KV cache. By sharing the KV cache through Maru’s CXL shared memory, Instance 2 can skip the prefill computation entirely and directly read the KV cache that Instance 1 already stored.

Prerequisites

  • At least 2 GPUs

  • LMCache >= v0.3.14 installed (pip install lmcache)

  • vLLM installed

  • Maru installed (see Installation)

Configuration

Both instances share a single configuration file (maru-config.yaml):

chunk_size: 256
local_cpu: True
max_local_cpu_size: 5
enable_async_loading: True

enable_p2p: False
enable_controller: False

remote_url: "maru://localhost:${MARU_SERVER_PORT}"
remote_serde: "naive"
remote_storage_plugins: ["maru"]

extra_config:
  remote_storage_plugin.maru.module_path: maru_lmcache.adapter
  remote_storage_plugin.maru.class_name: MaruConnectorAdapter
  maru_pool_size: "4G"
  save_chunk_meta: False
  lookup_backoff_time: 0.001

Maru is loaded as an LMCache remote storage plugin. For details on each configuration field, see LMCache Integration.

How to Run

(Optional) Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

1. Launch two vLLM instances

The launcher script starts MaruServer and both vLLM instances automatically:

cd examples/lmcache/p2p_sharing
./p2p_example.sh

Wait until you see:

All servers are up. You can send request now...

2. Try a simple query

Open a new terminal and send a single prompt to both instances:

cd examples/lmcache/p2p_sharing

# Send a prompt to Instance 1 (store KV cache), then the same prompt to Instance 2 (retrieve)
./run_simple_query.sh

You’ll see the prompt and both instances’ responses printed directly. Check inst2.log for cache hit messages:

LMCache INFO: [req_id=cmpl-a5a94ea4577d4025-0] Retrieved 256 out of 256 required tokens (from 256 total tokens). size: 0.0029 gb, cost 3.0579 ms, throughput: 0.9581 GB/s; (cache_engine.py:874:lmcache.v1.cache_engine)

3. Run a benchmark

Once you’ve confirmed cache sharing works, measure the TTFT (Time-To-First-Token) speedup:

./run_benchmark.sh

This sends streaming requests to both instances and reports the TTFT speedup from KV cache reuse:

==========================================================
  P2P KV Cache Sharing - Results
==========================================================
  Session 1 (store):    TTFT = 1234.5 ms
  Session 2 (retrieve): TTFT = 56.7 ms
  TTFT Speedup:         21.77x
  Cache Hit:            Yes
==========================================================

Press Ctrl+C in the first terminal to stop all servers.