Parallel Xceleration Library (PXL)

PXL is the core runtime library for the XCENA SDK. It provides APIs to manage device resources, load compute kernels, and run parallel operations on XCENA hardware.

PXL Architecture Overview


Development Workflow

The XCENA SDK lets you offload computation to CXL memory accelerators:

flowchart TD
    subgraph PrepareOffloading[Prepare Offloading]
        direction LR
        OffloadingApplication[Offloading Application] --> |Build|ComputeKernel[Compute Kernel]
    end

    subgraph Accelerator
        direction LR
        Processor <--> |Process Kernel|CxlMemory[CXL Memory]
    end

    PrepareOffloading ==> |Load| HostApplication[Host Application]
    HostApplication <--> |API Call| RuntimeLibrary[Runtime Library]
    RuntimeLibrary <--> |Control| Processor
    RuntimeLibrary <--> |Allocate| CxlMemory
    HostApplication <--> |Data Access| CxlMemory

A typical project goes through these steps:

  1. Write a compute kernel in MU code.
  2. Write a host application using PXL APIs.
  3. Build both components.
  4. Run on hardware or the emulator.

For hands-on examples, see Tutorials.


Object Hierarchy

PXL exposes a small set of objects that model the host-to-device execution path:

flowchart LR
    Context --> Job
    Context -.owns.-> DeviceMem[Device Memory]
    Job --> Stream[default Stream]
    Job --> Map
    Map -.runs.-> Function
    Module --> Function
    Map -.takes.-> NDArray
    Map -.takes.-> DeviceMem
Object Role
Context A device session. Owns device memory and creates Jobs.
Job A reservation of one or more Subs. Loads kernel binaries and creates Maps.
Module / Function A compiled MU kernel binary and its callable entry points.
Map One kernel launch — binds a Function to a taskCount and arguments.
Stream An asynchronous work queue. Each Job has its own private default stream.
NDArray A typed, shaped view over device memory that PXL automatically slices per task.

In This Section

Page What it covers
Device Topology Sub / Cluster / MU core hierarchy and how parallelism maps onto it.
Programming Objects Lifecycle of Context, Job, Module, and Function.
Kernel Execution Map, task and batch distribution, NDArray, locality mode, execution lifecycle.
Streams Asynchronous execution model, default stream behavior, and concurrency guidelines.

Next Steps


Table of contents