Architecture Overview¶
Maru manages KV cache data in CXL shared memory, enabling cross-instance sharing across multiple nodes without data transfer.
System Architecture¶
%%{ init: { "flowchart": { "curve": "linear" } } }%%
flowchart TB
subgraph Instances[" "]
direction LR
subgraph SN["Server N"]
direction TB
V1(["LLM Instance"])
H1{{"MaruHandler"}}
V1 --- H1
end
subgraph S2["Server 2"]
direction TB
V2(["LLM Instance"])
H2{{"MaruHandler"}}
V2 --- H2
end
subgraph S1["Server 1"]
direction TB
V3(["LLM Instance"])
H3{{"MaruHandler"}}
V3 --- H3
end
end
subgraph ControlPlane["Control Plane"]
direction LR
subgraph Remote["Remote Mode"]
direction TB
MS["MaruServer"]:::maru
D["MaruResourceManager"]:::maru
MS <--> D
end
subgraph Filesystem["Shared Filesystem Mode"]
direction TB
FS["MaruFs"]:::fs
end
end
H1 <-.->|"store / retrieve"| FS
H2 <-.->|"store / retrieve"| MS
H3 <-.->|"store / retrieve"| MS
subgraph CXL["CXL Shared Memory"]
direction LR
R0["Region 0"] ~~~ R1["Region 1"] ~~~ R2["Region 2"]
end
D -.->|"allocate / free regions"| CXL
FS -.-> CXL
H1 <==>|"read / write"| CXL
H2 <==>|"read / write"| CXL
H3 <==>|"read / write"| CXL
classDef maru fill:#f8cecc,stroke:#b85450,font-weight:bold
classDef fs fill:#dae8fc,stroke:#6c8ebf,font-weight:bold
Control Plane (dashed arrows) — KV metadata operations and region allocation.
Data Plane (solid arrows) — direct access to CXL shared memory, zero-copy. The data path is identical regardless of control plane mode.
The system has three layers:
Layer |
Role |
Components |
|---|---|---|
Client |
KV operations, page allocation, region mapping |
MaruHandler |
Metadata |
Key registry, allocation lifecycle |
MaruServer (Remote) / marufs (Filesystem) |
Memory |
Shared memory pool, capability issuance, crash recovery |
MaruResourceManager (Remote) / marufs (Filesystem) |
Key Design Properties¶
Zero-copy data path. Clients access KV data directly in shared memory — no server process ever touches the data path (dashed arrows in the diagram). The only traffic on the control plane is lightweight metadata; the data itself never moves. This strict control/data plane separation means data-path performance is bounded by memory bandwidth, not by software overhead.
Per-application control plane. Each application group runs its own metadata service for isolation (e.g., app A with 2 instances, app B with 3 instances). A single Resource Manager manages the shared memory pool across all groups. The diagram below illustrates this in Remote mode:
%%{ init: { "flowchart": { "curve": "linear" } } }%%
flowchart LR
subgraph AppA["App A"]
direction TB
A1(["Instance 1"])
A2(["Instance 2"])
end
subgraph AppB["App B"]
direction TB
B1(["Instance 1"])
B2(["Instance 2"])
B3(["Instance 3"])
end
MSA["MaruServer A"]:::maru
MSB["MaruServer B"]:::maru
RM["MaruResourceManager"]:::rm
A1 & A2 --> MSA
B1 & B2 & B3 --> MSB
MSA & MSB --> RM
subgraph CXL["CXL Shared Memory Pool"]
direction TB
R0["Region 0"] ~~~ R1["Region 1"] ~~~ R2["Region 2"] ~~~ R3["Region 3"]
end
RM --> CXL
classDef maru fill:#f8cecc,stroke:#b85450,font-weight:bold
classDef rm fill:#d5e8d4,stroke:#82b366,font-weight:bold
Pluggable control plane. The control plane is isolated behind a stable interface, so its implementation can change without affecting the data path. Remote mode (current) uses a centralized MaruServer + MaruResourceManager. Shared Filesystem mode (in development) replaces both with MaruFs, enforcing memory access control at the kernel level for stronger security than user-space RPC.
Capability-based memory access. Clients never open shared memory devices directly. The Resource Manager acts as a capability broker, issuing authorized handles that grant access to specific memory regions (the Memory layer in the diagram). This confines hardware access to a single trusted process and decouples clients from the underlying memory technology.
Data Flow¶
Store¶
sequenceDiagram
participant C as Caller
participant H as MaruHandler
participant CP as Control Plane
participant CXL as CXL Shared Memory
C->>H: store(key, data)
H->>H: allocate page from owned region
H->>CXL: write data directly (zero-copy)
H->>CP: register key → location
CP-->>H: OK
H-->>C: success
Data is written to shared memory before the key is registered. Other instances can never observe a partial write — the key only becomes visible after the data is fully committed.
Retrieve (cross-instance)¶
sequenceDiagram
participant C as Caller
participant H as MaruHandler
participant CP as Control Plane
participant CXL as CXL Shared Memory
C->>H: retrieve(key)
H->>CP: lookup key
CP-->>H: location (region, offset, length)
H->>H: map region if not yet mapped
H->>CXL: direct read (zero-copy)
H-->>C: data
Every retrieve requires one metadata lookup via the control plane. Once a region is mapped, the mapping is cached for subsequent accesses to the same region — only the first access to a given region incurs the mmap cost.
Extensibility¶
MaruHandler is framework-independent. Its interface operates on string keys and memory views — a minimal, framework-neutral contract. Any inference framework can integrate with Maru by writing a thin adapter layer (typically under 200 lines) that converts framework-specific cache keys to strings and delegates to MaruHandler’s store/retrieve API.
graph LR
A[LMCache] -->|MaruConnector| D[MaruHandler]
B[SGLang] -->|Future Adapter| D
C[Other Framework] -->|Custom Adapter| D
D --> E[MaruServer]
D --> F[CXL Shared Memory]
See also: LMCache Integration, MaruHandler Design