pxcc — Heterogeneous C/C++ Compiler
pxcc is a single-source compiler for the XCENA platform. You write host code and device kernels in the same .cpp file; pxcc extracts the kernels, compiles them for the device, and embeds them into one host binary that loads them at runtime through PXL.
Experimental — available from SDK v1.4.8 pxcc is released as an experimental compiler. Command-line flags, the programming model, and supported C++ surface may change in future SDK releases without a deprecation period. For production builds that need a stable interface, continue to use the separate-kernel build flow shown in Hello Sort.
Why pxcc
The traditional SDK flow compiles the compute kernel separately into a .mubin file, which the host application then loads by path at runtime. pxcc removes that split — one source, one compiler invocation, one binary:
| Separate-kernel flow | pxcc single-source flow | |
|---|---|---|
| Kernel source | mu_kernel/*.cpp, registered with MU_KERNEL_ADD | same .cpp as host, marked __pxl_kernel__ |
| Kernel binary | a standalone .mubin file shipped and loaded by path | embedded in the host executable — nothing extra to package or locate |
| Host references a kernel by | name string (createFunction("name")) | the function symbol (createFunction<name>()) — a normal C++ entity |
| What the symbol buys you | — | code navigation (go-to-def / rename), compile-time type checking of launch args, template kernels (sort_with_ptr<int>), and function overloading |
| Includes & linking | configured by hand | mu_lib include path and -lpxl added automatically |
| Build steps | build kernel → build host → run | pxcc++ app.cpp -o app → run |
A complete example
#include <algorithm>
#include <cstdio>
#include "mu/mu.hpp"
#include "pxl/pxl.hpp"
// Device kernel — extracted and compiled for the device by pxcc.
__pxl_kernel__ void sort_with_ptr(int* arr, int size)
{
int idx = mu::getTaskIdx();
std::sort(arr + idx * size, arr + idx * size + size);
}
int main()
{
const int testCount = 2048, sortSize = 64;
auto* data = pxl::allocateMemory<int>(0, testCount * sortSize);
for (int i = 0; i < testCount; i++)
for (int j = 0; j < sortSize; j++)
data[i * sortSize + j] = sortSize - j;
auto result = pxl::Launcher().execute<sort_with_ptr>(testCount, data, sortSize).run();
if (result.status != pxl::Result::Success)
{
printf("FAIL: %s\n", result.errorMessage.c_str());
pxl::releaseMemory(data);
return 1;
}
// Verify every sub-array is sorted in ascending order.
bool sorted = true;
for (int i = 0; i < testCount && sorted; i++)
for (int j = 1; j < sortSize; j++)
if (data[i * sortSize + j - 1] > data[i * sortSize + j]) { sorted = false; break; }
printf("%s\n", sorted ? "PASS" : "FAIL");
pxl::releaseMemory(data);
return sorted ? 0 : 1;
}
pxcc++ sort.cpp -o sort # compile host + device and link in one step
sudo ./sort # CXL device memory access requires elevated privileges
A runnable version of this example lives in example/experimental/heterogeneous_compile.cpp. A template-kernel variant lives in example/experimental/template_programming.cpp.
Key facts
- Two drivers:
pxcc(C) andpxcc++(C++). They are drop-in compiler front-ends — set them asCMAKE_C_COMPILER/CMAKE_CXX_COMPILER. - Host code may use C++17/20/23; device code is C++17 or earlier. Set the device standard with
-Xmu=-std=...if needed (see Device options below); the full rule is in Constraints. - Includes and linking are automatic. The
mu_libinclude path is added on compile and-lpxlis added on link — no extra CMake configuration needed. - Host-only code just works. A source file with no
__pxl_kernel__is compiled like an ordinary clang C++ translation unit. - Changing kernel code requires recompiling the binary. Because kernels are embedded in the host executable, there is no separate kernel artifact to rebuild and relink on its own — rebuild the program after a kernel change.
Programming model
pxcc lets host code (x86-64) and device code (MU) live in the same translation unit. You mark which functions belong to the device with annotations; pxcc splits the source, compiles each half with the right backend, and embeds the device binary into the host binary.
Annotations
| Annotation | Where the code runs |
|---|---|
| (none) | Host only. An un-annotated function is compiled for the host and is not available on the device — calling it from kernel/device code is an error. |
__pxl_kernel__ | Device kernel — a launchable entry point. |
__mu_device__ | Device-only helper function. mu:: APIs are available. |
__mu_shared__ | Code shared by host and device. mu:: APIs are not available. |
So an ordinary function stays on the host; to make code reachable from a kernel, annotate it __mu_device__ (device-only) or __mu_shared__ (host and device):
int hostUtil(int x); // no annotation → host only; not callable from a kernel
__mu_device__ int devUtil(int x); // device only
__mu_shared__ int shared(int x); // host and device
__pxl_kernel__ void k(int* a, int n)
{
a[0] = devUtil(a[0]) + shared(a[0]); // OK
// a[0] = hostUtil(a[0]); // ERROR: host-only symbol on the device
}
A small end-to-end example using all three annotations together:
#include "mu/mu.hpp"
// Shared by both sides — plain logic, no mu:: calls.
__mu_shared__ int clamp(int v, int lo, int hi)
{
return v < lo ? lo : (v > hi ? hi : v);
}
// Device-only helper — may call mu:: APIs.
__mu_device__ int taskBase(int size)
{
return mu::getTaskIdx() * size;
}
// Kernel — the launchable entry point.
__pxl_kernel__ void scale(int* arr, int size, int factor)
{
int base = taskBase(size);
for (int i = 0; i < size; i++)
arr[base + i] = clamp(arr[base + i] * factor, 0, 1000);
}
__mu_shared__is only needed on the declaration (class __mu_shared__ ClassName { ... };); you do not annotate every member function individually.__mu_shared__functions cannot usemu::APIs (they also compile for the host). Use__mu_device__when you needmu::.- The raw form
[[clang::annotate("mu::device")]]is accepted as an alternative to the__mu_device__macro.
Multi-file projects
Kernels and shared code may span translation units. Declare cross-TU shared symbols with __mu_shared__ in a header and compile the files together:
pxcc++ -c kernels.cpp helpers.cpp # compile each to <stem>.o in the current directory
pxcc++ kernels.o helpers.o -o app # link
# or in one step:
pxcc++ kernels.cpp helpers.cpp -o app
Constraints
- Device code is C++17 or earlier. Code that runs on the device (functions reachable from
__pxl_kernel__) is limited to C++17; the device backend does not yet support later standards. Host code is unaffected and may use C++17, C++20, or C++23. The device standard can be set with-Xmu=-std=.... - The
mu::namespace is available in__pxl_kernel__and__mu_device__code, but not in__mu_shared__code.
Compilation and linking
pxcc accepts the familiar clang/gcc-style compile and link forms.
# Single file (CMake style)
pxcc++ -c input.cpp -o input.o # compile only
pxcc++ input.o -o program # link
pxcc++ input.cpp -o program # compile + link in one step
# Multiple files
pxcc++ -c f1.cpp f2.cpp # compile each to <stem>.o in the current directory
pxcc++ f1.cpp f2.cpp -o app # compile + link several files
-o is always a single file, exactly as in clang/gcc. Passing -o together with -c and multiple source files is rejected (cannot specify -o when generating multiple output files); omit -o to get one <stem>.o per source in the current directory.
Two things happen automatically, so you do not pass them yourself:
- the
mu_libinclude path is added on every compile, and -lpxlis added on every link.
This is why the CMake setup needs nothing beyond setting the compiler and the C++ standard.
Device options — -Xmu=
Flags that should reach the device compiler are forwarded with -Xmu=<arg> (one flag per -Xmu=). Anything not prefixed with -Xmu= applies to the host side.
| Form | Effect on device compilation |
|---|---|
-Xmu=-std=c++17 | Set the device C++ standard (C++17 or earlier; host standard is set independently). |
-Xmu=-O0 … -Xmu=-O3, -Xmu=-Os | Device optimization level (default -O3). |
-Xmu=-g | Emit device debug info. |
-Xmu=-I/path | Add a device include search path. |
-Xmu=-isystem -Xmu=/path | Add a device-only system include path (not visible to the host compiler). |
-Xmu=-DNAME=VALUE | Define a device preprocessor macro. |
-Xmu=-L/path | Add a device library search path. |
-Xmu=-lfoo | Link an extra device library — e.g. -Xmu=-L/opt/xarith/lib -Xmu=-lxarith. |
pxcc++ -Xmu=-g -Xmu=-O0 -c kernel.cpp -o kernel.o # device debug build, no opt
pxcc++ -Xmu=-I/opt/inc -Xmu=-DDEBUG kernel.cpp -o app
Host compiler options
Flags without the -Xmu= prefix apply to the host compiler. The common ones:
| Option | Purpose |
|---|---|
-I<dir> | Add a host include directory. |
-D<macro>[=value] | Define a host preprocessor macro. |
-U<macro> | Undefine a host preprocessor macro. |
--isystem <dir> | Add a system include path to both host and device (use -Xmu=-isystem for device only). |
--include <file> | Force-include a header before the source. |
-x <language> | Set the source language (c or c++). |
--std=<standard> | Host C++ standard (default c++17). The host may use a newer standard than the device — see Constraints. |
-MF <file> | Write make-style dependency output to <file> (for incremental build systems). |
-MT <target> | Set the target name emitted in the dependency output. |
-MQ <target> | Like -MT, but quotes characters special to make. |
Any flag pxcc does not recognize is forwarded to the host compiler unchanged.
Selecting the host compiler
pxcc resolves the host compiler in this order:
--host-compiler=<path> > PXCC_HOST_CXX / PXCC_HOST_CC > CXX / CC
> clang in PATH > /usr/bin/clang
--host-compiler=<path>— use a specific host compiler for this invocation.PXCC_HOST_CXX/PXCC_HOST_CC(env) — host C++ / C compiler forpxcc++/pxcc.CXX/CC(env) — fallback when the above are unset.
pxcc++ --host-compiler=/usr/bin/g++-12 -c app.cpp -o app.o
PXCC_HOST_CXX=/usr/bin/g++-12 pxcc++ app.cpp -o app
Diagnostics and intermediates
pxcc++ -save-temps -c input.cpp -o input.o # keep intermediate files
-save-temps preserves the intermediate artifacts of the split-compile pipeline next to the output:
| File | Description |
|---|---|
mu.{filename}.cpp | Extracted device code. |
host.{filename}.cpp | Transformed host code. |
mu_kernel.mubin | Device binary. |
mu_kernel.o | Embedding object linked into the host binary. |
These are the first things to inspect when a kernel does not behave as expected — see Troubleshooting.
To see what pxcc actually runs, use one of the verbose forms:
pxcc++ --pxcc-verbose -c input.cpp -o input.o # pxcc's own internal command log
pxcc++ -v -c input.cpp -o input.o # commands the host compiler runs
pxcc++ --dry-run -c input.cpp -o input.o # print commands without executing (= -###)
CMake integration
set(CMAKE_CXX_COMPILER pxcc++)
set(CMAKE_CXX_STANDARD 17)
add_executable(myapp main.cpp kernel.cpp)
target_compile_options(myapp PRIVATE -Xmu=-O2) # device options
Override the host compiler from CMake via the flag or an environment variable:
cmake -DCMAKE_CXX_COMPILER=pxcc++ -DCMAKE_CXX_FLAGS="--host-compiler=/usr/bin/g++-12" ..
# or
PXCC_HOST_CXX=/usr/bin/g++-12 cmake -DCMAKE_CXX_COMPILER=pxcc++ ..
mu_lib include paths and -lpxl linking are handled automatically.
Help and version
pxcc++ --help # brief help with key features
pxcc++ --manual # full man-page-style manual (all options, examples, troubleshooting)
pxcc++ --version # version string (pxcc, mu_llvm, host/device targets)
Reference summary
| Option | Applies to | Purpose |
|---|---|---|
-c | both | Compile only; do not link. |
-o <path> | both | Output file (always a single file). |
-j <N> | driver | Max parallel compilation jobs (default: CPU cores). |
-Xmu=<arg> | device | Forward <arg> to the device compiler. |
--host-compiler=<path> | driver | Choose the host compiler. |
--std=<standard> | host | Host C++ standard (default c++17). |
-MF / -MT / -MQ | host | Make-style dependency output: file / target / quoted target. |
-save-temps | both | Keep intermediate host/device sources and binaries. |
--pxcc-verbose | driver | Log the commands pxcc executes. |
-v | host | Show the commands the host compiler runs. |
--dry-run, -### | driver | Print commands without executing. |
--help / --version / --manual | driver | Brief help / version / full manual. |
Related
- Hello Sort (Heterogeneous Programming) — write, compile, and run your first single-source program.
- Troubleshooting — device C++ standard errors, missing-annotation symptoms, link failures.