balar
The balar library provides an interface between SST and GPGPU-Sim, a cycle-level simulator modeling contemporary graphics processing units (GPUs) running GPU computing workloads written in CUDA. It supports two execution modes: trace-driven and direct-execution.
- Trace-driven: balar is driven by a test CPU that consume CUDA API traces (not to be confused with SASS traces for Accel-Sim) and launch CUDA calls.
- Direct-execution: a CPU model execute a CUDA binary linked with custom
libcudart
and dispatch CUDA API calls to balar and GPGPU-Sim.
The BalarTestCPU component is a trace-based test CPU that is included inside balar folder (./testcpu/
) to run simulations with CUDA API call traces and data collected from a real GPU. It works by consuming a trace file and associated CUDA memory copy data files. The cudaMemcpyH2D
data payload is collected for program correctness. The cudaMemcpyD2H
data is collected to validate computation.
BalarMMIO is responsible for relaying CUDA API requests from SST to GPGPU-Sim. Currently it supports running with CUDA traces without a real CPU model (with BalarTestCPU) or with a Vanadis core running RISCV + CUDA binary with a custom CUDA runtime (libcudart_vanadis
inside ./tests/vanadisLLVMRISCV/
). This mode has been tested with a subset of Rodinia 2.0 benchmark kernels in unittest.
The dmaEngine component performs memory data transfers between SST cache memory space and simulator memory space. It is required as balar will read/write the CPU data (i.e. cudaMemcpy()
with vanadis) and place them into GPGPU-Sim's memory space for functional simulation. In addition, dmaEngine is also used to read CUDA dispatch packet and write return value for the custom CUDA runtime.
Source code: sst-elements/.../balar
SST name: balar
Maturity Level: Prototype (2)
Development Path: Active
Last Released: SST 14.0
Support for trace-driven mode currently is limited as it was used as early stage validation for balar implementation. It has only been tested with a simple integer vector add example.
We are working on providing a more robust version of this with the new NVBit release. Including better trace format and better computation validation.
Required dependencies
- CUDA Version 11.0+ is recommended
- GPGPUSim Use the dev branch from accel-sim/gpgpu-sim_distribution
Optional dependencies
- LLVM For compiling RISCV + CUDA binary, 18.x.x+ should work (specifically, builds after resolving llvm/llvm-project#57544 should work)
- RISCV GNU Toolchain For compiling RISCV + CUDA binary. Generally all recent distribution should work, we have only tested on tag
2024.08.06-nightly
. - gpu-app-collection For running unittest with Rodinia 2.0 kernels
- Test Docker image You can also opt in the prebuilt docker image with all dependencies installed except for GPGPU-Sim:
- OS: Ubuntu 22.04
- CUDA: 11.7
- LLVM: 18.1.8
- RISCV: 2024.08.06-nightly
-
# Pull prebuilt image
docker pull ghcr.io/accel-sim/accel-sim-framework:SST-Integration-Ubuntu-22.04-cuda-11.7-llvm-18.1.8-riscv-gnu-2024.08.06-nightly
📄️ QuickStart
This page provides instructions to setup balar and run test examples.
📄️ Tracing CUDA Program
This page provides steps to use tracer tool to generate CUDA API traces to run balar with BalarTestCPU.
📄️ Compiling RISCV + CUDA
This page provides information to compile a CUDA program from source code into binary that can be run with vanadis and balar.
📄️ Balar In Depth
This doc provide some high level views on various aspects of balar.