Tracing CUDA Program
This page provides steps to use tracer tool to generate CUDA API traces to run balar with BalarTestCPU.
We are working on providing a more robust version of the tracer with the new NVBit release. Including better trace format and better computation validation.
In order to run balar in trace-driven mode, you will need to supply the BalarTestCPU component with a trace file and associated GPU memory copy data dump. We have created an instruction tracer tool based on NVBit to generate those. You can find relevant code inside Accel-Sim framework: ACCEL_SIM_SRC/util/tracer_nvbit/others/cuda_api_tracer_tool.
To setup the tool and generate traces, you will need a machine with a NVIDIA GPU installed. Also NVBit has some requirements for both GPU hardware and software versions. Refer to its README for more info.
To pull and compile the tracer tool:
# Get the Accel-Sim framework
git clone git@github.com:accel-sim/accel-sim-framework.git
# cd into tracer tool folder
cd accel-sim-framework/util/tracer_nvbit
# Install nvbit
./install_nvbit.sh
# Compile tracer tool
# Which will generate a 'cuda_api_tracer_tool.so' file at
# './others/cuda_api_tracer_tool/cuda_api_tracer'
make -C ./others/cuda_api_tracer_tool
Then, in order to dump traces, put path to the tracer tool shared object in LD_PRELOAD:
LD_PRELOAD=PATH_TO/cuda_api_tracer_tool.so CUDA_PROG
Which will generate the following files when exiting:
- cuda_calls.trace: the API trace file tracking- cudaMemcpy
- cudaMalloc
- cuda kernel launches
- cudaFree
 
- cuMemcpyD2H-X-X.data: cuda memcpy device to host data payload
- cuMemcpyH2D-X-X.data: cuda memcpy host to device data payload