Kernel Programming Guide
Making kernel code is pretty simple! Here’s what you need to know:
- Include the Essential Header
Always start by including the mu/mu.hpp header. It’s a must-have!#include "mu/mu.hpp" -
Pick Your Host-Callable Functions
Choose which functions you want to call from the host.
Just remember to use MU_KERNEL_ADD macro to show it’s host-callable.For example:
void kernel_callable(int size) { ... } void host_callable(int* data, int size) { // Your awesome code here } MU_KERNEL_ADD(host_callable) -
Keep Parameters in Check
You can use up to 9 parameters for your kernel function. No more! - Memory Constraints
- Heap Size: The kernel’s heap size is limited to 3MB.
- Stack Size: The kernel’s stack size is limited to 64KB.
Stick to these guidelines, and you’ll be creating great kernel code in no time!
Heterogeneous (single-source) programming
The traditional kernel flow keeps MU code in a separate mu_kernel/*.cpp file and registers host-callable functions with MU_KERNEL_ADD.
pxcc also supports a single-source style: write the host code and the device kernel in the same .cpp file, then mark the device entry point with __pxl_kernel__.
#include <algorithm>
#include "mu/mu.hpp"
#include "pxl/pxl.hpp"
__pxl_kernel__ void sort_with_ptr(int* arr, int size)
{
int idx = mu::getTaskIdx();
int* base = arr + idx * size;
std::sort(base, base + size);
}
The host can launch this kernel by its C++ symbol with pxl::Launcher:
auto result = pxl::Launcher().execute<sort_with_ptr>(testCount, data, sortSize).run();
This removes the separate .mubin path and the createModule() / createFunction() / buildMap() launch setup for the common case.
See Hello Sort (Heterogeneous Programming) for a complete walkthrough, and Compiler for annotations (__pxl_kernel__, __mu_device__, __mu_shared__) and multi-file builds. (experimental — pxcc)