Device Topology
An XCENA device exposes its compute resources as a three-level hierarchy. Knowing this layout makes the rest of PXL — Job, Map, taskCount, locality mode — easier to reason about.
Hierarchy
flowchart TD
Device[XCENA Device]
Device --> Sub0[Sub 0]
Device --> SubN[Sub N]
Sub0 --> Cluster0[Cluster 0]
Sub0 --> ClusterM[Cluster M]
SubN --> SubNDots[same structure]
Cluster0 --> Core0[MU core 0]
Cluster0 --> CoreK[MU core k]
| Layer | What it is |
|---|---|
| Sub | The unit of compute reservation. A Job is created with a number of Subs and owns them for its lifetime. |
| Cluster | A group of MU cores inside a Sub that share an L2 cache. |
| MU core | A RISC-V execution unit inside a Cluster. This is the smallest scheduling unit — each MU core runs your kernel function, and mu::getTaskIdx() is observed here. |
The exact counts (Subs per device, Clusters per Sub, MU cores per Cluster) vary by device generation. Don’t hard-code these numbers — the hierarchy itself is what’s stable.
How parallelism maps onto the hierarchy
When you launch a Map:
- The
Jobprovides a pool of MU cores (across all Subs the Job owns). - PXL distributes the launch’s tasks across those cores.
- Each MU core invokes your kernel function once per task it received.
The unit of distribution is the MU core, not the Sub.
numSubonly sets the size of the available core pool. See Kernel Execution for howtaskCountandbatchSizeinteract with this pool.
The next page introduces the host-side objects you use to reserve Subs, load kernels, and launch work onto this hierarchy.
→ Related: Programming Objects, Kernel Execution