Skip to main content

balar

The balar library provides an interface between SST and GPGPU-Sim, a cycle-level simulator modeling contemporary graphics processing units (GPUs) running GPU computing workloads written in CUDA. It supports two execution modes: trace-driven and direct-execution.

  • Trace-driven: balar is driven by a test CPU that consume CUDA API traces (not to be confused with SASS traces for Accel-Sim) and launch CUDA calls.
  • Direct-execution: a CPU model execute a CUDA binary linked with custom libcudart and dispatch CUDA API calls to balar and GPGPU-Sim.

The BalarTestCPU component is a trace-based test CPU that is included inside balar folder (./testcpu/) to run simulations with CUDA API call traces and data collected from a real GPU. It works by consuming a trace file and associated CUDA memory copy data files. The cudaMemcpyH2D data payload is collected for program correctness. The cudaMemcpyD2H data is collected to validate computation.

BalarMMIO is responsible for relaying CUDA API requests from SST to GPGPU-Sim. Currently it supports running with CUDA traces without a real CPU model (with BalarTestCPU) or with a Vanadis core running RISCV + CUDA binary with a custom CUDA runtime (libcudart_vanadis inside ./tests/vanadisLLVMRISCV/). This mode has been tested with a subset of Rodinia 2.0 benchmark kernels in unittest.

The dmaEngine component performs memory data transfers between SST cache memory space and simulator memory space. It is required as balar will read/write the CPU data (i.e. cudaMemcpy() with vanadis) and place them into GPGPU-Sim's memory space for functional simulation. In addition, dmaEngine is also used to read CUDA dispatch packet and write return value for the custom CUDA runtime.

At a Glance

Source code: sst-elements/.../balar  
SST name: balar  
Maturity Level: Prototype (2)  
Development Path: Active  
Last Released: SST 14.0

warning

Support for trace-driven mode currently is limited as it was used as early stage validation for balar implementation. It has only been tested with a simple integer vector add example.

We are working on providing a more robust version of this with the new NVBit release. Including better trace format and better computation validation.

Required dependencies

Optional dependencies

  • LLVM For compiling RISCV + CUDA binary, 18.x.x+ should work (specifically, builds after resolving llvm/llvm-project#57544 should work)
  • RISCV GNU Toolchain For compiling RISCV + CUDA binary. Generally all recent distribution should work, we have only tested on tag 2024.08.06-nightly.
  • gpu-app-collection For running unittest with Rodinia 2.0 kernels
  • Test Docker image You can also opt in the prebuilt docker image with all dependencies installed except for GPGPU-Sim:
    • OS: Ubuntu 22.04
    • CUDA: 11.7
    • LLVM: 18.1.8
    • RISCV: 2024.08.06-nightly
    • # Pull prebuilt image
      docker pull ghcr.io/accel-sim/accel-sim-framework:SST-Integration-Ubuntu-22.04-cuda-11.7-llvm-18.1.8-riscv-gnu-2024.08.06-nightly