Troubleshooting

This section provides solutions to common problems you may encounter.

Index


Illegal Instruction Error with SIMD Access in QEMU

Problem

When accessing CXL memory allocated by PXL’s memAlloc API using SIMD instructions within QEMU, an illegal instruction error may occur. This issue can also arise when accessing the memory using float* or double* pointers. This is a known issue in QEMU Issue #3075.

Solution

To resolve this issue, run QEMU in no-KVM mode. This can be done by adding the --no-kvm option when starting QEMU:

./run.sh --no-kvm

Note Disabling KVM may reduce guest OS performance.


Could not access KVM kernel module: Permission denied

Problem

In order to launch QEMU, this error occurs when the current user lacks the necessary permissions to access the KVM (Kernel-based Virtual Machine) kernel module.

Solution

  1. Add User to KVM Group.
    sudo usermod -a -G kvm $USER
    
  2. Log out and log back in (or open a new terminal session) to apply the group change.
  3. Run QEMU again.

cxl Package Issue

Problem

The cxl list command is not working.

Solution

  1. Ensure the environment is a Docker container:
  2. Check the cxl package version:
    • If the version is incorrect, certain features may not be visible.
      cxl version          # Expected version: 72.1+
      
  3. If issues persist, try reinstalling Docker.

daxctl Package Issue

Problem

The daxctl list command is not working.

Solution

  1. Ensure the environment is a Docker container.
  2. Verify that the CXL device is attached as a .Mem device:
    lspci | grep CXL                # Check the BDF (Bus Device Function).
    lspci -vvs <BDF> | grep CXLCtl  # Ensure "Mem+" is included.
    

xcena_cli Execution Errors

Problem

The output is 0 in the xcena_cli num-device command.

Solution

  1. Ensure the required Python modules are installed:
    pip list | grep -E 'click|pandas|pyvcd|serial|pexpect'
    
  2. Check if Docker was started with the --privileged flag:
    ls /sys/fs/cgroup/    # If successful, Docker is running in privileged mode.
    
  3. Confirm that the mx_dma module is loaded into the kernel:
    lsmod | grep mx_dma
    
  4. (Re-)Install and reload the mx_dma module if necessary:
    docker cp xcena_sdk:/work/driver /tmp/mx_dma
    docker stop xcena_sdk
    docker rm xcena_sdk
    
    cd /tmp/mx_dma
    sudo ./install.sh
    
    reboot
    
    lsmod | grep mx_dma
    
    # If mx_dma is not loaded, reload the modules:
    sudo rmmod cxl_pmem cxl_acpi cxl_pci cxl_core mx_dma
    sudo insmod /lib/modules/5.15.0-43-generic/extra/mx_dma.ko
    sudo insmod /lib/modules/5.15.0-43-generic/extra/cxl_5.15/core/cxl_core.ko
    sudo insmod /lib/modules/5.15.0-43-generic/extra/cxl_5.15/cxl_pci.ko
    sudo insmod /lib/modules/5.15.0-43-generic/extra/cxl_5.15/cxl_acpi.ko
    sudo insmod /lib/modules/5.15.0-43-generic/extra/cxl_5.15/cxl_pmem.ko
    
    # Re-run a Docker container
    

Example Test Failures

Problem

An example test fails with a core dump.

Steps to Troubleshoot

  1. Check if num_device is greater than or equal to 1:
    xcena_cli num-device
    
  2. Verify that MSUB bitmap is non-zero:
    xcena_cli device-info 0
    
    • If it is 0x0, offloading cannot proceed.

Collecting Troubleshooting Logs

If the issue persists after following the steps above, collect diagnostic logs using the troubleshooting.sh script and share them with the support team.

Run Log Collection Script

Download and run troubleshooting.sh to collect system and device diagnostic logs:

wget https://raw.githubusercontent.com/xcena-dev/public_sdk_release/refs/heads/main/scripts/troubleshooting.sh
bash troubleshooting.sh

Example Output

  XCENA Troubleshooting Report
  yyyy-mm-dd hh:mm:ss KST
  Output: troubleshooting_report_yyyy-mm-dd-hh-dd.log

[collect] 0. Host Validation
    -> Running validate_host.sh (local)             [OK]
[collect] 1. Kernel
    -> 1-1. dmesg                                   [OK]
    -> 1-2. Kernel version                          [OK]
    -> 1-3. Boot parameters                         [OK]
[collect] 2. PXL
    -> 2-1. pxl_resourced journal                   [OK]
[collect] 3. iomem
    -> 3-1. /proc/iomem                             [OK]
[collect] 4. Detailed CXL Environment
    -> 4-1. cxl list                                [OK]
    -> 4-2. sysfs CXL devices                       [OK]
    -> 4-3. daxctl list                             [OK]
    -> 4-4. DAX devices                             [OK]
    -> 4-5. CEDT ACPI table                         [OK]
    -> 4-6. NUMA topology                           [OK]
[collect] 5. Firmware Info
    -> 5-1. xcena_cli fw-info                       [OK]
[collect] 6. CXL Device Verbose Information
    -> 6-1. lspci verbose for CXL devices           [OK]

  Done. Report saved to: troubleshooting_report_${yyyy-mm-dd-hh-mm}.log

Note The generated log may contain sensitive or confidential information (e.g., host configuration, network topology, internal device details). Please review the log file and redact any confidential data before sharing it with the support team.

Share the generated .log file when reporting issues.


pxcc Compiler Issues

The sections below cover the experimental pxcc compiler.

Device C++ standard errors

Problem

The device compile fails with errors about C++20/C++23 features used in a kernel or device function, or about an unsupported -std value on the device side. Code that runs on the device (functions reachable from __pxl_kernel__) must be C++17 or earlier — the device backend does not yet support later standards. Host code is unaffected and may use C++17/20/23.

Solution

Keep device code within C++17, or set the device standard explicitly with -Xmu=. The host standard is independent and set the usual way:

pxcc++ -Xmu=-std=c++17 app.cpp -o app   # device standard; host unaffected
set(CMAKE_CXX_STANDARD 20)   # host may be C++20/23; device stays C++17 or lower

Kernel not found or not extracted

Problem

The build succeeds but a launch fails at runtime, or the device code is empty. This usually means the function was not annotated, so pxcc compiled it as ordinary host code instead of extracting it for the device.

Solution

Annotate the entry point with __pxl_kernel__, and make sure #include "mu/mu.hpp" is present. Confirm the extraction with -save-temps and check that the kernel appears in mu.{filename}.cpp:

pxcc++ -save-temps -c app.cpp -o app.o
grep sort_with_ptr mu.app.cpp

See the annotations table for __pxl_kernel__, __mu_device__, and __mu_shared__.

mu:: used in shared code

Problem

Errors that mu:: symbols are undeclared inside a __mu_shared__ function. __mu_shared__ code also compiles for the host, where mu:: APIs do not exist.

Solution

Move the mu::-using logic into a __mu_device__ helper, or call it only from __pxl_kernel__ / __mu_device__ code.

Problem

The link step fails with cannot find -lpxl or unresolved PXL symbols. pxcc adds -lpxl automatically, but the PXL library is not on the linker search path (for example, a non-standard SDK install location).

Solution

Point the linker at the PXL library directory.

pxcc++ app.o -L/usr/local/lib -o app

Verify the SDK is installed — see Install.

Host compiler not found

Problem

pxcc aborts early with Host compiler not found before any compilation runs. It could not resolve a host compiler from --host-compiler, the PXCC_HOST_CXX / PXCC_HOST_CC (or CXX / CC) environment variables, or a clang on PATH.

Solution

Point pxcc at a host compiler explicitly, or install one on PATH.

pxcc++ --host-compiler=/usr/bin/g++-12 -c app.cpp -o app.o
# or
PXCC_HOST_CXX=/usr/bin/g++-12 pxcc++ app.cpp -o app

No device code found

Problem

The build prints No device code found and produces a host-only binary. This is informational, not an error — the source has no __pxl_kernel__ (or __mu_shared__) code, so pxcc compiled it as an ordinary host translation unit.

Solution

Nothing is needed if the file is intentionally host-only. If you expected a kernel, confirm it is annotated with __pxl_kernel__ — see Kernel not found or not extracted.

Inspecting intermediate output

Problem

pxcc behaves unexpectedly and you need to see what it generated for the host and device sides.

Solution

Build with -save-temps and read the generated files:

File Use it to check
mu.{filename}.cpp Did the kernel get extracted? Is the body what you expect?
host.{filename}.cpp How the host side was transformed.
mu_kernel.mubin Device binary — confirm the kernel compiled successfully.
mu_kernel.o Embedding object — confirm it was linked into the host binary.

If the extracted device code looks correct but the binary still misbehaves, see pxcc Compiler for the full diagnostics reference.