Troubleshooting
This section provides solutions to common problems you may encounter.
Index
- Troubleshooting
Illegal Instruction Error with SIMD Access in QEMU
Problem
When accessing CXL memory allocated by PXL’s memAlloc API using SIMD instructions within QEMU, an illegal instruction error may occur. This issue can also arise when accessing the memory using float* or double* pointers. This is a known issue in QEMU Issue #3075.
Solution
To resolve this issue, run QEMU in no-KVM mode. This can be done by adding the --no-kvm option when starting QEMU:
./run.sh --no-kvm
Note Disabling KVM may reduce guest OS performance.
Could not access KVM kernel module: Permission denied
Problem
In order to launch QEMU, this error occurs when the current user lacks the necessary permissions to access the KVM (Kernel-based Virtual Machine) kernel module.
Solution
- Add User to KVM Group.
sudo usermod -a -G kvm $USER - Log out and log back in (or open a new terminal session) to apply the group change.
- Run QEMU again.
cxl Package Issue
Problem
The cxl list command is not working.
Solution
- Ensure the environment is a Docker container:
- Check the
cxlpackage version:- If the version is incorrect, certain features may not be visible.
cxl version # Expected version: 72.1+
- If the version is incorrect, certain features may not be visible.
- If issues persist, try reinstalling Docker.
daxctl Package Issue
Problem
The daxctl list command is not working.
Solution
- Ensure the environment is a Docker container.
- Verify that the CXL device is attached as a
.Memdevice:lspci | grep CXL # Check the BDF (Bus Device Function). lspci -vvs <BDF> | grep CXLCtl # Ensure "Mem+" is included.
xcena_cli Execution Errors
Problem
The output is 0 in the xcena_cli num-device command.
Solution
- Ensure the required Python modules are installed:
pip list | grep -E 'click|pandas|pyvcd|serial|pexpect' - Check if Docker was started with the
--privilegedflag:ls /sys/fs/cgroup/ # If successful, Docker is running in privileged mode. - Confirm that the
mx_dmamodule is loaded into the kernel:lsmod | grep mx_dma - (Re-)Install and reload the
mx_dmamodule if necessary:docker cp xcena_sdk:/work/driver /tmp/mx_dma docker stop xcena_sdk docker rm xcena_sdk cd /tmp/mx_dma sudo ./install.sh reboot lsmod | grep mx_dma # If mx_dma is not loaded, reload the modules: sudo rmmod cxl_pmem cxl_acpi cxl_pci cxl_core mx_dma sudo insmod /lib/modules/5.15.0-43-generic/extra/mx_dma.ko sudo insmod /lib/modules/5.15.0-43-generic/extra/cxl_5.15/core/cxl_core.ko sudo insmod /lib/modules/5.15.0-43-generic/extra/cxl_5.15/cxl_pci.ko sudo insmod /lib/modules/5.15.0-43-generic/extra/cxl_5.15/cxl_acpi.ko sudo insmod /lib/modules/5.15.0-43-generic/extra/cxl_5.15/cxl_pmem.ko # Re-run a Docker container
Example Test Failures
Problem
An example test fails with a core dump.
Steps to Troubleshoot
- Check if
num_deviceis greater than or equal to1:xcena_cli num-device- If
num_deviceis0, refer to thexcena_cliExecution Errors section.
- If
- Verify that
MSUB bitmapis non-zero:xcena_cli device-info 0- If it is
0x0, offloading cannot proceed.
- If it is
Collecting Troubleshooting Logs
If the issue persists after following the steps above, collect diagnostic logs using the troubleshooting.sh script and share them with the support team.
Run Log Collection Script
Download and run troubleshooting.sh to collect system and device diagnostic logs:
wget https://raw.githubusercontent.com/xcena-dev/public_sdk_release/refs/heads/main/scripts/troubleshooting.sh
bash troubleshooting.sh
Example Output
XCENA Troubleshooting Report
yyyy-mm-dd hh:mm:ss KST
Output: troubleshooting_report_yyyy-mm-dd-hh-dd.log
[collect] 0. Host Validation
-> Running validate_host.sh (local) [OK]
[collect] 1. Kernel
-> 1-1. dmesg [OK]
-> 1-2. Kernel version [OK]
-> 1-3. Boot parameters [OK]
[collect] 2. PXL
-> 2-1. pxl_resourced journal [OK]
[collect] 3. iomem
-> 3-1. /proc/iomem [OK]
[collect] 4. Detailed CXL Environment
-> 4-1. cxl list [OK]
-> 4-2. sysfs CXL devices [OK]
-> 4-3. daxctl list [OK]
-> 4-4. DAX devices [OK]
-> 4-5. CEDT ACPI table [OK]
-> 4-6. NUMA topology [OK]
[collect] 5. Firmware Info
-> 5-1. xcena_cli fw-info [OK]
[collect] 6. CXL Device Verbose Information
-> 6-1. lspci verbose for CXL devices [OK]
Done. Report saved to: troubleshooting_report_${yyyy-mm-dd-hh-mm}.log
Note The generated log may contain sensitive or confidential information (e.g., host configuration, network topology, internal device details). Please review the log file and redact any confidential data before sharing it with the support team.
Share the generated .log file when reporting issues.