XArith is a device-side C++ library for XCENA's Computational Memory (MX1), providing high-performance vector computation primitives for VPE (Vector Processing Engine). Used with the MU library (mu/mu.hpp), it enables developers to build high-performance MU kernels running on the MX1's vector processing hardware.
Key Characteristics
- Device-only library: Runs exclusively on MX1, not on the host
- Thread-safe: Safe for concurrent access from multiple threads. For best performance, each thread should use its own VpeContext
- Synchronous operations: All vector operations block until completion. Asynchronous operations will be available in a future release
API Reference
Context
| Method | Description |
VpeContext(dimension, strategy) | Create context with vector dimension and VPE strategy |
getDimension() | Get vector dimension |
getAvailableBufferCount() | Get number of available slots |
getMaxBufferCount() | Get maximum buffer slots |
Buffer Management
| Method | Description |
allocateBuffer() | Allocate SRAM buffer, returns offset or INVALID_BUFFER |
tryAllocateBuffers(count, outBuffers) | Atomically allocate multiple buffers (all-or-nothing), returns true on success |
freeBuffer(offset) | Free previously allocated buffer |
freeBuffers(count, buffers) | Free multiple buffers in a single atomic operation |
load(srcDram, destBuffer) | Load vector from DRAM to SRAM |
store(srcBuffer, destDram) | Store vector from SRAM to DRAM |
Vector Operations
| Method | Description |
dot(buf1, buf2) | Compute dot product |
add(src1, src2, dest) | Element-wise addition: dest = src1 + src2 |
sub(src1, src2, dest) | Element-wise subtraction: dest = src1 - src2 |
mul(src1, src2, dest) | Element-wise multiplication: dest = src1 * src2 |
div(src1, src2, dest) | Element-wise division: dest = src1 / src2 |
square(src, dest) | Element-wise square: dest = src * src |
bitwiseXor(src1, src2, dest) | Element-wise XOR: dest = src1 ^ src2 |
addReduce(src1, src2) | Sum of element-wise addition |
subReduce(src1, src2) | Sum of element-wise subtraction |
divReduce(src1, src2) | Sum of element-wise division |
equals(buf1, buf2) | Check if vectors are equal |
isAllZero(buf) | Check if all elements are zero |
VpeIdStrategy Options
Each Sub contains two VPEs. The strategy determines how work is distributed across them.
| Strategy | Value | Description |
ByThreadId | Default | Distributes threads within same MU across VPEs |
ByClusterId | Alternative | Distributes MUs from different clusters across VPEs |