Consistency and Safety


1. Data Visibility

Maru provides strong consistency for store operations: once store() returns successfully, the stored data is immediately retrievable by any other instance.

This relies on write-then-register ordering. The handler always completes these steps in sequence:

  1. Write — copy data into the CXL memory region.

  2. Register — notify the metadata registry that the key now maps to that location.

Because registration only occurs after the write is fully committed to shared memory, no reader can ever observe a partial or in-progress write. The key simply does not exist in the registry until the data is complete.

        sequenceDiagram
    participant W as Writer (Instance A)
    participant CXL as CXL Shared Memory
    participant Meta as Metadata Registry
    participant R as Reader (Instance B)

    W->>CXL: 1. Write data to region
    Note over CXL: Data fully committed
    W->>Meta: 2. Register key -> location
    Note over Meta: Key now globally visible

    R->>Meta: 3. Lookup key
    Meta-->>R: location (region, offset)
    R->>CXL: 4. Read data (zero-copy)
    Note over R: Complete, consistent data
    

The visibility point is when register_kv RPC completes. The server holds the key in an in-memory registry protected by a lock, ensuring that concurrent lookups always see a fully committed entry or no entry at all.


2. Concurrency

Maru serializes at two levels:

        flowchart LR
    subgraph Instance_A["Instance A"]
        SA["store / delete"] -->|"_write_lock"| RPC_A["RPC"]
        RA["retrieve"] --> RPC_A
    end

    subgraph Instance_B["Instance B"]
        SB["store / delete"] -->|"_write_lock"| RPC_B["RPC"]
        RB["retrieve"] --> RPC_B
    end

    subgraph Server["MaruServer"]
        RPC_A & RPC_B -->|"RLock"| Meta["KV Registry +\nAllocation Manager"]
    end

    style SA fill:#fff3e0,stroke:#e65100
    style SB fill:#fff3e0,stroke:#e65100
    style RA fill:#e8f5e9,stroke:#4a9
    style RB fill:#e8f5e9,stroke:#4a9
    
  • Client: _write_lock serializes all writes (store, delete) within a single handler instance. Retrieve is lock-free.

  • Server: A single RLock serializes all metadata mutations (register, delete, ref-count updates) across instances.

These two locks, combined with write-then-register ordering (Section 1), produce the following guarantees:

Scenario

Behavior

Same-key store (cross-instance)

First registration wins; losing writer frees its page

Different-key store (cross-instance)

Parallel — independent pages, independent server calls

Store within one instance

Serialized by client write lock

Store + retrieve (same key)

No partial read — key invisible until data committed

Concurrent retrieve

Lock-free on client; shared mappings need no synchronization


3. Crash Recovery

Maru is designed so that metadata can be reconstructed after a crash. The recovery strategy differs by component.

3.1 Recovery Sequence

        sequenceDiagram
    participant RM as Resource Manager

    Note over RM: Resource Manager restart detected

    rect rgb(240, 248, 255)
        RM->>RM: Load last checkpoint (free lists + allocation map)
        RM->>RM: Replay WAL entries since checkpoint
        RM->>RM: Recompute free sizes from reconstituted state
        RM->>RM: Verify CRC32 integrity on all metadata
        Note over RM: Resource Manager ready
    end
    

Resource Manager persists every allocation and free operation to a write-ahead log (WAL) before modifying in-memory state. Every 100 operations, a checkpoint is taken: free lists and the global allocation map are saved with CRC32 integrity verification. On restart, the last checkpoint is loaded and any subsequent WAL entries are replayed, fully reconstructing the allocation state.

3.2 Client Crash Recovery

When a client process terminates unexpectedly (crash, kill, network partition), its allocated memory regions must be reclaimed to prevent leaks.

Component

Detection

Reclamation

Resource Manager

Reaper polls process liveness every 1 second

Orphaned regions returned to free list; WAL records the free operation

MaruServer

Deferred freeing state machine

Region freed only when both owner disconnected and KV reference count reaches zero

The reaper defends against PID reuse by caching each client’s process start time at allocation time. If the PID is recycled by the OS, the start-time mismatch triggers reclamation.


4. Failure Modes

Failure

Impact

Recovery

Data Loss

Client crash

Owned regions orphaned; keys become stale

Reaper + deferred freeing

None

Resource Manager crash

New region allocation blocked

WAL + checkpoint replay on restart

None

Network partition (client-server)

Affected client cannot store/retrieve

Client reconnects when network recovers

None

CXL device failure

All data on the device is lost

Not supported – no cross-device replication

Total