Home » SSTDocumentation » SST for Newbies

SST for Newbies

UNDER CONSTRUCTION: CONTENT NOT VALID

Introduction

The Structural Simulation Toolkit (SST) is an open-source software framework used for simulating large-scale High Performance Computing (HPC - i.e. Super Computer) systems. It allows parallel simulation of these HPC systems at multiple levels of detail. The simulations can be run on systems as small as a personal workstation on up to large scale multi-node (10,000+) HPC systems. These simulations are intended to assist with designing and evaluating future HPC systems and computer architectures.

SST itself is not a simulator. Instead it is an application and support toolkit that allows multiple disparate/discrete simulators to share data with each other in order to perform very large scale simulations.

Definitions

The term “node” represents a self contained computer system. Workstations are considered to be a single node. Whereas HPC systems typically consist of many nodes.

SST uses MPI for communication between nodes. MPI creates “ranks” which are a separate computation unit on a node. Note that multiple ranks can be created on a single node, but typically there is one rank per node.

Simple Examples

For example: A simulated “computer board” (microprocessor, memory, network interface) can be created using a few discrete simulators (one each for the microprocessor, memory and network interface). Each of these simulators share data between them using SST Links. If additional “computer boards” are desired (representing a simulated HPC system), the user only needs to instantiate more simulators and connect them together using the SST Links. Additionally, a router can be simulated and connected to each “computer board” allowing it to share data between the other simulated “computer boards”. All of these separate simulator instances can be run on a single workstation or on an existing HPC system.
A serial workstation example is shown here:
- This example shows a simulation of 3 “computer boards” using a single workstation. Each of the “computer board” simulations are comprised of 3 different simulators sharing data using SST. An additional router simulation is used to share data between the “computer boards” There will be 10 separate concurrent simulations running on this single workstation all sharing data using SST.

A parallel workstation example is shown here:
- This example shows a simulation of 3 “computer boards” using a single workstation running multiple MPI Ranks (simulated nodes). Each of the “computer board” simulations are comprised of 3 different simulators sharing data using SST. Each Computer board is assigned to a specific rank. An additional router simulation is used to share data between the “computer boards” There will be 10 separate concurrent simulations (split between 4 separate ranks) running on this single workstation all sharing data using SST.

An HPC multi-node example is shown here:
- This example shows a simulation of 200 “computer boards” using 200 separate HPC nodes (each node is running one MPI Rank). Each of the “computer board” nodes is running 3 different simulators sharing data using SST. An additional HPC node is used to simulate the router, it shares data with all the “computer board” nodes using SST. Each of the 200 nodes are running 3 simulations all sharing data with each other and the router simulation using SST.

SST Overview

SST by itself is an application that runs on a computer system. When run, it will read an input input configuration file which then causes SST to dynamically load a set of Dynamic Libraries (called Element Libraries - similar to plugins). These Element Libraries contain a number of internal objects (Components, Modules, etc) that represent the simulators and support objects to perform a simulation. Note that SST can load multiple Element Libraries (simulators) at the same time.

After Loading the Element Libraries, the configuration file will also provide information to SST on how to connect the various objects within an Element Library to other Element Libraries via SST Links (think of a link as a wire or communication channel).

On a multi-rank (multi-node) simulation. SST uses a partitioning algorithm to decide what ranks should load specific Element Libraries (simulators). The first instance of SST will load the configuration file, and then perform the partitioning algorithm. That instance of SST will then use MPI to launch additional instances of SST (on the other ranks) with a partitioned subset of the configuration file. These other instances of SST will then load their appropriate Element Libraries (as defined by the partitioning). Lastly, all Element Libraries across all Ranks will connect (as defined by the configuration) to each other using SST Links.

After all Element Libraries are linked, the simulation will start and run to completion.

There are 3 major items that make up an SST simulation:

The SST application
The Element Libraries
The SST Configuration File.

If SST is to be run on a multi-node HPC system, it will work with MPI and the HPC system software to run a copy of SST on each required rank/node, instantiate the Element Libraries on specific ranks (called partitioning), and properly create SST links between the Element Libraries. During partitioning, different Element Libraries may be assigned to different ranks depending upon the partitioning algorithm.

After all Element Libraries and loaded and linked, SST will pass operational control to these Libraries allowing them to perform their simulations. During these simulations, the Element Libraries will send and receive event data over the SST links. This will allow the simulations to request and respond to these events as appropriate.

The simulation(s) will continue until a user defined run time limit occurs or all simulations have finished.

At runtime, SST must be provided with an input file that defines what Element Libraries must be loaded and how they are linked, along with any other runtime configuration settings. In a multi-node system, the first executing copy of SST will share the configuration settings with all the other copies of SST along with a partitioning mapping of what nodes are to load what Element Libraries.

The Element Libraries are developed using the C++ programming language and are integrated with the SST Application Programming Interface (API). The Element Libraries are typically designed as one of the following:

An Element Library that performs a simulation and is specifically designed to run with SST.
An Element Library that provides a interface between SST and a 3rd party simulator. Usually these 3rd party simulators are not designed for SST, and the Element Library provides an connection “wrapper” interface between that simulator and SST.
Other support features such as Partitioners, Statistics or Python Extensions to the Input file.

SST provides many support services to the Element Libraries via the SST API. These include clocks, events, statistics, partitioning, communication links, etc.

SST Core Application

The SST Core incorporates all the features required to run SST as an application, partition and load the Element Libraries, synchronize multiple copies of SST across ranks/nodes, provide event queues, handle SST links, and provide additional services that the Element Libraries may require.

Additionally, the SST core provides an API that Element Libraries are compiled and linked with to properly interface with SST.

Note: The SST core is a separately compiled C++ application from the Element Libraries (which are compiled dynamic libraries).

Python Script (Input File)

The SST input file is written using the Python programming language with extensions specific to SST.

SST has an built-in Python interpreter and will automatically read and execute the Python script provided as the input file.
The SST extensions to Python provide subroutines that are used to command SST to:
- Create Element Library components and pass runtime parameters to them.
- Create SST Links and “wire” them to the components
- Control statistics operation.
- Setup runtime parameters.
Python was chosen due to extensibility and ability to quickly generate large architectures and link them together programmatically.
The input file is commonly referred to as the “SDL file”, “SST Configuration File”, or the “Input Deck”.
For more information on the input file see [wiki:SSTUserPythonFileFormat SST Configuration File, Python Version].

Element Libraries

An SST Element Library is a compiled C++ library (Linux .so or Mac OSX .dylib) that is dynamically loaded and linked at runtime by the SST Application.

Each Element Library is comprised of one or more of the following functional objects:

Components - Implements the specific functional features of the the simulators instantiated by SST.
Sub-Components - A grouping of support functions that can be instantiated by Components.
Modules - Similar to Sub-Component, but lighter weight in implementation.
Events - All the Events created/consumed by the functional objects in this Element Library.
Partitioners - Tools to assist with deciding what Element Libraries should be run on what HPC nodes.
Python Module Generator - Tools to extend Python support during processing of the SST input file.
Generators - Deprecated feature.
Introspectors - Deprecated feature.
Additionally, Components, Modules and Sub-Components provide additional objects:
Params - The parameters provided to the functional object.
Ports - The a listing of the connection points for SST Links on a Component.
Statistics - List of statistics that Components and Sub-Components provide.

At the top level of every Element Library there is a static C++ ElementLibraryInfo structure that identifies the information on all the functional objects. This structure is used at runtime by the SST Core to assist with loading of Element Libraries.

The majority of Element Libraries will contain one or more Components. However Element Libraries can be developed to provide specific support features and may contain no Components.

Components

Components can be regarded as the main code of a simulator (This code may be an actual simulator or a wrapper over a 3rd party simulator).

More than one Component can exist within a single Element Library, and the majority of Element Libraries contain one or more components.

An example would be a Memory Simulation Element Library containing components for:

A Memory Controller model.
A Bus model.
A Cache Memory model.
A Trivial CPU model.

Each of these models are separate discrete simulators all contained within a single Element Library. The separate components could be “wired” together using SST Links to perform a simple simulation of a computer system.

Sub-Components

A grouping of support functions that can be instantiated by Components. Can be thought of as “plug-in” support code for Components.

Sub-Components have access to most of the SST API features provided to Components.

A Component that instantiates a Sub-Component may be in the same or different Element Library from the Sub-Component.

An example would be a router module that provides a query software access to a

Modules

Very similar to Sub-Components however does not have access to the support features of Components. A much lighter weight implementation

Events

Partitioning

Parameters

Params - The parameters provided to the functional object. These are set in the SST input file and provides the operational settings for the object.

Ports

Ports - (Only on Components) The a listing of the connection points for SST Links on a Component. Also identifies what type of Events are supported by this port.

Statistics

Statistics - List of statistics that Components and Sub-Components provide.

Support Systems

Statistics

Clocks & Events

Single Node vs Multinode runs

MPI

Supported Platforms

Linux
Mac OSX

Repository

Release vs Trunk

Testing

Existing Simulators that work with SST

Each release of SST comes with a number of Sandia developed simulators for Processor, Memory, and Networking. Additionally, many 3rd party developers have integrated simulators that are compatible with SST.

A 3rd party performance simulator can gain several benefits from integration with SST including:

Interacting with other performance simulators that have been integrated with SST (e.g. DRAMSim2, gem5, Iris, SystemC).
Utilizing SST’s services for modeling (Power, Temperature, Statistics, Reliability)
Access to SST’s parallel simulation environment.
SST aims, over time, to become a standard simulation environment for designing HPC systems by helping Industry, Academia, and the National Labs in designing and evaluating future architectures.

3rd Party simulators

DRAMSim2 - A hardware-validated, cycle-accurate, C based simulator for DRAM devices such as DDR3.
NVDIMMSim - A cycle-accurate non-volatile memory simulator for devices such as NAND flash.
!HybridSim - A cycle-accurate simulation of a non-volatile memory system augmented with a DRAM based cache.
QSim - A thread safe multicore emulation library based on the QEMU emulator.
CHDL - A C++ Hardware Description Language and Toolchain.
MacSim - A heterogeneous architecture simulator, which is trace-driven and cycle-level.

Sandia provided simulators

ariel - Prototype PIN-based Memory Tracing component which does not require initial trace capture (in-development)
cacheTracer - Cache Tracing Element (Formally was simpleTracerComponent)
cassini - Simulation modules for uncore non-cache processor components including prefetching logic and branch predictors
ember - ???
event_test - Contains a test of event serialization and passing for the core
firefly - Implements a low level communication protocol and data movement layer connecting to network hardware simulation components such as the Merlin router
hermes - Provides an interface to message passing functions allowing multiple driving components to utilize the simulation of network operations in a standardized manner
memHierarchy - Simulates a flexible memory hierarchy, including caches and memory controllers (links to DRAMSim and VaultSim). Cache coherency, based on either a snoopy-based scheme or directory-based scheme is supported as well. Uses the standard interface * * *MemEvent” for interfacing with other components.
merlin - Router model with flexible topology modules to simulate networks. Also includes a stub module with which other components can build NIC models that interact with Merlin.
miranda - Miranda Component.
patterns - Patterns is a collection of networking and endpoint components. Endpoints are state machines that inject traffic into the network, receive messages, and react to them. These endpoints are meant to mimic benchmark and simple application communication behavior; their communication patterns. The endpoints can be connected using router components and interact with storage-like devices. All of these pieces are designed to require little memory and CPU cycles. The goal is to instantiate millions of these endpoints and observe the global behavior when they interact with each other. Because they are resource constrained, the pattern components only superficially simulate routers, NICs, and applications. A research goal is to determine how fine grained component simulation needs to be to deliver valid results and how many of them can be deployed concurrently; i.e., how large a machine can we simulate? This is work in progress and may not yet be useful to general SST users.
prospero - Reads a trace and generates standard memEvents which can then be passed onto the memHierarchy cache/memory models. There is also a simple trace tool which runs under the PIN binary instrumentation framework to capture a memory trace. It is possible to trace only a specific function (rather than a whole application) using this tool to reduce the simulation pressure.
scheduler - Implements 5 models in 2 components. Scheduling, allocation, and machine models are implemented in the schedComponent, and node and failure models are implemented in the nodeComponent.
simpleElementExample - Demo Component
VaultSimC - (in-development)
zodiac - Zodiac Component

Zoltan - Metis - GLPK -