SST V12.1.0 Release Notes
General
- To enable future improvements in parsing, configuring, and error checking, SST 11.0 introduced naming conventions for various items that require names. In general, names must be legal Python variables. These conventions are encouraged in SST v12.x and will be enforced in SST 13.0.0. A detailed description of the new naming conventions can be found here (LINK HERE!).
- SST 12.x is supported on Apple Silicon
- Key SST 12.1 improvements includes threading performance optimizations, new element libraries Mercury for user-space workload modeling and Osseous to support RTL components, and significant improvements to the Vanadis element library
Deprecation and Removal Notices
- We anticipate removing support for Python 2 in SST 13. SST 12.1 is likely the last SST release that will include Python 2 support.
SST-Core (PDES-Core)
- Added Profiling Points which allow users to profile SST performance more directly. These are available in the Core as well as to Components and SubComponents.
- Fix to force Python garbage collection at end of Python input execution
- Added capability to filter test outputs to SST testing framework
- Performance improvements to avoid use of Simulation::getSimulation() in time-critical functions
- Command-line option
--num_threads
is now --num-threads
for naming consistency
- In ELI,
SST_ELI_REGISTER_SUBCOMPONENT
can now be used in place of SST_ELI_REGISTER_SUBCOMPONENT_DERIVED
. Both are equally valid although the _DERIVED
version may be deprecated in a future release.
- Added ability to specify arbitrary attributes (as key/value pairs) on Components and SubComponents in the ELI.
SST::SubComponent
no longer inherits from SST::Module
- Clean up to sst-info output
- Performance enhancement in MemPools to reduce interthread contention
- Fixed issue where setting thread count in Python input script didn’t work
- Added function to the Python module to let the input file control whether
Py_Finalize
is called in the destructor. It is not called by default (same behavior as prior releases). The default behavior may change for SST for 13.0 release.
- Added component count to the verbose output of the ConfigGraph
- Updates to work correctly with Python 3.11
SST-Elements
- New element libraries have been added: Osseous (support for RTL components) and Mercury (user-space workload modeling adapted from SST-Macro)
- Added support through Pin 3.24 for Ariel & Prospero
- New 2D BFS pattern added to Ember motifs
- Vanadis underwent significant capability improvements and bug fixes
- LSQ overhaul including
- Fix for several debugging-related segfaults
- Corrected issue that prevented multiple stores from being inflight
- Improved support for fences
- Added support for auto-fracturing over cachelines to support RISC-V architectures without alignment constraints
- Added support for multiple-cache-line loads and stores
- Optimization for clearing LSQ based on hardware threads
- Fixed overlap detection between loads and stores
- Fixed override error in RISC-V decoder
- Cleaned up and added additional debug output
- Additional error checking
- Fixed issues including register mapping for partial loads, payload zero padding, register copy operation, errors in various instruction decoders
- Added tracking for unissued memory operations when function units are not available
- Performance improvements, especially when debug output is not enabled
- Additional tests for RISC-V and MIPS
- New RISCV instructions including: FSD, FLE.S, FLT.S, FCVT.W.D, FCVT.WU.D, FCVT.LU.D, FCVT.D.L, FCVT.D.LU, FMV.X.W, FMADD.D, AMOSWAP, FEQ.D, FMSUB.D, FNMSUB.D, FMIN.D, FSQRT.D, FSQRT.S, MULHU, FSFLAGS, REM, SLT
- New and updated syscall implementations: SET_THREAD_AREAD, GETTIME, MMAP, FSTAT, MADVISE
SST-Macro
- Replaced std::thread with pthreads
- SST-Macro does not compile in standalone mode (i.e., without SST-Core) on OSX under some versions of XCode. We do not anticipate fixing this as it is believed that the issue does not exist in newer versions of XCode.
- Known working versions: XCode 13.4.1, XCode 13.3.1, XCode 13.0
- Known non-working versions: XCode 13.2.1, XCode 13.2.0
Known Issues (see open issues for repos):
Known incompatibilities:
- PIN
- Ariel Element requires PIN
- Prospero’s trace generator requires PIN
- memHierarchy’s Sieve component requires Ariel (and by extension, PIN)
- Balar Element requires PIN 2.14 (see note below)
- PIN 2.14
- PIN 2.14 is no longer distributed by Intel. SST support for PIN 2.14 is deprecated and replacement functionality is under development.
- PIN 2.14 is only supported on gcc 4.9 or earlier.
- SST-12 does not support PIN 2.14 on OSX platforms
- PIN 3.22
- Balar will not run on PIN 3.22; it will only support PIN 2.14
- SST-12 does not yet support PIN 3.22 on OSX platforms
- Sirius Element (Zodiac 27 Feature) does not support serialization
- Balar Element does not run on OSX (NO PIN SUPPORT)
- Element tests with too few components are not run in parallel (e.g., a test with one component is not run in our multi-threaded or multi-rank tests).
Operating System / Compiler Combinations:
Release V12.1.0 (core and elements, including macro) has been built and tested with the following OS/compiler combinations.
- CentOS 7.7 with GCC 4.8.5
- SNL RedHat 7.7 with GCC 4.8.5
- SNL RedHat 8 with GCC 8.3.1
- TOSS 4.3-4 with GCC 8.5.0
- Ubuntu 20.04 LTS (64-bit) with GCC 9.3.0
- Ubuntu 22.04 LTS (64-bit) with GCC 11.2
- Mac OSX 11.6 (Big Sur) with XCode 12.5 (Intel)(LLVM)
- Mac OSX 11.6 (Big Sur) with XCode 13 (Intel)(LLVM)
- Mac OSX 11.6 (Big Sur) with XCode 13 (ARM)(LLVM)
- Mac OSX 12.2 (Monterey) with XCode 13 (Intel)(LLVM)
External Components:
External components and the compatible version numbers are listed below. Other versions may work, but these have been tested with this release.
- Required
- Recommended
- OpenMPI 4.0.5 (SST No longer supports OpenMPI V2)
- Optional
- DRAMSim2 2.2.2
- DRAMsim3 1.0.0
- NVDimmSim 2.0.0
- HybridSim 2.0.1
- Intel Pin Tool 2.14-71313 - Pin 2.14 support is deprecated and Intel no longer distributes this version.
- Intel Pin Tool 3.17
- HDF5 1.10.5 or greater
- Goblin HMC Simulator Version:sst-8.0.0-release
- HBM Dramsim2 Version:hbm-1.0.0-release
- Ramulator SHA 7d2e723
- Nvidia CUDA SDK 8.0.61/9.1.45 (For Balar Component)
- sst-gpgpusim-external (v1.1)
Elements not in release:
Although available from the devel and master branches, the following elements are not provided in this release:
- GNA
- llyr
- Opal
- Serrano
- SimpleSimulation
Fixed Issues and Significant Code Changes (Pull Requests) in this Release:
- 812 Fix printf formats
- 825 Can’t set timebase to larger than ns
- 836 Get simulator state from simulator_impl
- 837 Catch HDF5 exceptions by const ref
- 839 Pointer is not set to null when deleted, but subsequently tested
- 844 Event serialization size is tallied twice
- 845 SST_SYNC_PROFILING is never enabled
- 846 RankSyncParallelSkip tracks serialization time for untimed phases
- 854 –print-timing labels are swapped?
- 856 Updating fix for issue #808; fixing issues in model crashes with perf-tracking and profiling enabled
- 861 Profile points
- 871 –num_threads option shoud be –num-threads
- 872 Python option sst.setThreadCount doesn’t work
- 874 Minor bug fixes
- 875 Added Simulation_impl pointer to BaseComponent and moved all time critical calls over to use the pointer instead of Simulation::getSimulation()
- 876 Threading option fixes
- 877 Updates to ELI and sst-info
- 880 Changed mempools to have fewer crossthread interactions
- 883 Small enhancements (Py_Finalize, parallel load options, verbose output)
- 889 Made profiling points available in Components and SubComponents
- 1784 Initial check-in for ERAS/Osseous (Osseous)
- 1891 Change Vanadis tests structure (Vanadis)
- 1898 Merlin::Bridge undefined symbol (non-lazy binding)
- 1894 Multiplex the syscall link between the Vanadis CPU and Vanadis OS (Vanadis)
- 1911 Remove usage of “uint” from cassini. Addresses issue #1909 (Cassini)
- 1913 Unable to run basic_vanadis.py, possible musl toolchain issue
- 1915 Using NIC between Vanadis LSQ and L1Dcache (Vanadis, MemHierarchy)
- 1917 memH: Bug fix in MESI_L1 (MemHierarchy)
- 1918 include Vanadis test binaries (Vanadis)
- 1920 Vanadis attempts to access freed instruction (memory bug) (Vanadis)
- 1923 Fix override error in RISCV decoder (Vanadis)
- 1925 Vanadis Segfault (Vanadis)
- 1929 uint is not a type (General)
- 1932 Fix firefly stream (Firefly)
- 1933 Initial check-in of Mercury components (Mercury)
- 1936 Fix Vanadis (Vanadis)
- 1938 New Vanadis LSQ Implementation (Vanadis)
- 1941 Enhance Vanadis Testing (Vanadis)
- 1942 Vandis Test Errors when Binaries Available (Vanadis)
- 1944 Enhance vanadis dbg (Vanadis)
- 1945 Fix ADDIW Immediate Decode in Vanadis RISCV (Vanadis)
- 1950 New BFS models (Ember)
- 1951 Update Handling of Floating Point Flags in RISCV (Vanadis)
- 1952 Signficant Set of Bug Fixes and Code Clean up for Vanadis (Vanadis)
- 1954 Use proper asm for linux/osx (Mercury)
- 1955 Fix use of uint instead of uint32_t in several components (CramSim, GNA, MemHierarchy)
- 1956 Fix mercury c++ standard compliance/warnings (Mercury)
- 1956 Vanadis: add new instructions and update tests (Vanadis)
- 1958 Support for AMO-SWAP Operations in Vanadis (Vanadis)
- 1959 Fix automake for Mercury (Mercury)
- 1960 Implement MMAP for RISCV (Vanadis)
- 1961 Vanadis fixes (Vanadis)
- 1962 Vanadis mainly adding new RISCV instructions (Vanadis)
- 1965 Adding mmap header to files that use mmap but were not including it (MemHierarchy, Vanadis)
- 1967 Adding mmap to operating_system.cc (Vanadis)
- 1969 New instruction test harness example (Vanadis)
- 1971 Performance Improvement for Vanadis Pipeline in frequently used data structures (Vanadis)
- 1972 Vanadis fixes (Vanadis)
- 1974 Vanadis (Vanadis)
- 1976 Vanadis (Vanadis)
- 1978 Vanadis (Vanadis)
- 1980 Change overlap detection in LSQ (Vanadis)
- 1983 fp64 unit tests (Vanadis)
- 1984 Tweak Data Structures to Improve Performance in Vanadis (Vanadis)
- 1985 Vanadis Data Structure Tweaks (Vanadis)
- 1986 Vanadis (Vanadis)
- 1987 Fix for gcc 4.2 std::list (Vanadis)
- 1989 Pin: Update to compile with Pin 3.24 (Ariel, Prospero)
- 1991 2D BFS Models (Ember)
- 1993 Modified Osseous for vecshift example and added missing files (Osseous)
- 1995 Fix up delete within the cache data structure for Vanadis (Vanadis)
- 675 Reimplement test_tls without std::thread
- 683 Remove std::thread. Add in pthreads in its place.