Tracing CUDA Program

This page provides steps to use tracer tool to generate CUDA API traces to run balar with BalarTestCPU.

warning

We are working on providing a more robust version of the tracer with the new NVBit release. Including better trace format and better computation validation.

In order to run balar in trace-driven mode, you will need to supply the BalarTestCPU component with a trace file and associated GPU memory copy data dump. We have created an instruction tracer tool based on NVBit to generate those. You can find relevant code inside Accel-Sim framework: ACCEL_SIM_SRC/util/tracer_nvbit/others/cuda_api_tracer_tool.

note

To setup the tool and generate traces, you will need a machine with a NVIDIA GPU installed. Also NVBit has some requirements for both GPU hardware and software versions. Refer to its README for more info.

To pull and compile the tracer tool:

# Get the Accel-Sim framework
git clone git@github.com:accel-sim/accel-sim-framework.git

# cd into tracer tool folder
cd accel-sim-framework/util/tracer_nvbit

# Install nvbit
./install_nvbit.sh

# Compile tracer tool
# Which will generate a 'cuda_api_tracer_tool.so' file at
# './others/cuda_api_tracer_tool/cuda_api_tracer'
make -C ./others/cuda_api_tracer_tool

Then, in order to dump traces, put path to the tracer tool shared object in LD_PRELOAD:

LD_PRELOAD=PATH_TO/cuda_api_tracer_tool.so CUDA_PROG

Which will generate the following files when exiting:

cuda_calls.trace: the API trace file tracking
- cudaMemcpy
- cudaMalloc
- cuda kernel launches
- cudaFree
cuMemcpyD2H-X-X.data: cuda memcpy device to host data payload
cuMemcpyH2D-X-X.data: cuda memcpy host to device data payload