Kernel Programming Guide

Making kernel code is pretty simple! Here’s what you need to know:

  1. Include the Essential Header
    Always start by including the mu/mu.hpp header. It’s a must-have!
    #include "mu/mu.hpp"
    
  2. Pick Your Host-Callable Functions
    Choose which functions you want to call from the host.
    Just remember to use MU_KERNEL_ADD macro to show it’s host-callable.

    For example:

    void kernel_callable(int size)
    {
       ...
    }
    
    void host_callable(int* data, int size)
    {
        // Your awesome code here
    }   
    MU_KERNEL_ADD(host_callable)  
    
  3. Keep Parameters in Check
    You can use up to 9 parameters for your kernel function. No more!

  4. Memory Constraints
    • Heap Size: The kernel’s heap size is limited to 3MB.
    • Stack Size: The kernel’s stack size is limited to 64KB.

Stick to these guidelines, and you’ll be creating great kernel code in no time!

Heterogeneous (single-source) programming

The traditional kernel flow keeps MU code in a separate mu_kernel/*.cpp file and registers host-callable functions with MU_KERNEL_ADD.

pxcc also supports a single-source style: write the host code and the device kernel in the same .cpp file, then mark the device entry point with __pxl_kernel__.

#include <algorithm>
#include "mu/mu.hpp"
#include "pxl/pxl.hpp"

__pxl_kernel__ void sort_with_ptr(int* arr, int size)
{
    int idx = mu::getTaskIdx();
    int* base = arr + idx * size;
    std::sort(base, base + size);
}

The host can launch this kernel by its C++ symbol with pxl::Launcher:

auto result = pxl::Launcher().execute<sort_with_ptr>(testCount, data, sortSize).run();

This removes the separate .mubin path and the createModule() / createFunction() / buildMap() launch setup for the common case.

See Hello Sort (Heterogeneous Programming) for a complete walkthrough, and Compiler for annotations (__pxl_kernel__, __mu_device__, __mu_shared__) and multi-file builds. (experimental — pxcc)