Navigation: Home | Downloads | Getting SST | Documentation | Support
MemHierarchy is a highly configurable, flexible, OO-based memory hierarchy simulator that models intra- and inter-node directory-based cache coherency architecture with MSI and MESI protocols (MOESI in progress). This hierarchical directory-based coherency architecture is similar to the ones used in modern computing systems, such as Intel’s core i7 and ARM’s cache coherency interconnects (ARM CoreLink CCN Family). MemHierarchy’s goal is to accurately simulate these modern architectures and be flexible and configurable enough to allow the users to explore different techniques. To achieve higher accuracy, MemHierarchy’s uses actual interconnect models and actual cycle delays to simulate contention and latencies between the caches, directory controller, and memory.
MemHierarchy allows for any number of cache levels (unlike Gem5’s Ruby). Coherency is maintained regardless whether caches are private or shared, and/or the number of levels.
In contrast to functional models, MemHierarchy provides SST with a flexible infrastructure capable of accurately simulating a wide range of cache topologies and memory systems. SST is not restricted to 2 or 3-level caches, any number of intra-node caches is allowed as well inter-node directory controllers. However, as compared to functional models, MemHierarchy is relatively slower.
SST 4.0 Release provides a MemHierarchy component that was design from scratch to make it more scalable, flexible, and reliable. Snooping coherency was dropped in support for the more modern (scalable) and more often used directory-base intra-node coherency.
Structure
MemHierarchy is structured in a ‘protocol-centric’ approach as done in real hardware. Incoming requests are handled by the cache controller (cache.cc), which serves as the main central unit of the memory hierarchy subsystem. Depending on the type of request, and/or state of the cache, the cache controller ultimately redirects requests to all the other subcomponents of the system, such as the MSHR, Cache Array, the Top Coherence Controller (Top CC), and the Bottom Coherence Controller (Bottom CC).
The Top CC is in charge of keeping and maintaining directory information of the lower-level, or child, caches. This is done per-cache-line basis and it includes state, sharers, and an acknowledge count. It is also in charge of sending invalidate request when needed and provide responses to child caches. States for each cache line are: “V”, and “Inv_A”. “V” indicates that as far as the top controller is concerned the cache line is in stable state. On the other hand, “Inv_A” indicates that the top coherence controller sent an invalidate cache line request to a child cache and it is waiting for a response.
The Bottom CC maintains the data state of every cache line in the cache. It determines when it is necessary to send upgrades, writebacks, or simply forward requests to the higher-level, or parent, caches. Every cache line has one of the following states: I, IS, IM, S, SI, EI, SM, E, M, MI, MS. One-letter states imply a stable state (in the MESI protocol) while two-letter states imply that the cache line is in a transition state and it is waiting for an action or response event. During a transition state the cache line is ‘locked’ and other requests have to wait for the current request to complete.
The Miss Status Handle Register, or MSHR, is part of the cache controller and its main job is to keep track of all the pending requests of the subsystem. Requests are always added to the MSHR, and are only removed from it when the requests have been properly handled. When MemHierarchy receives a response to a previously sent request, the cache controller checks all pending requests in the MSHR. If there’s a MSHR Hit, the pending request is processed by the MSHR.
All requests are synchronized when they are received. By default, MemHierarchy handles only one request per cycle but multiple requests can be handled if the “tag_copies” input variable is specified. All responses are delayed by a configurable number of cycles specified by the “access_latency_cycles” input variable.
All requests and responses are sent through SST Links. This accurately models the contention of memory requests throughout the system. Similarly, all requests and responses have actual ‘latency’ associated with them. Therefore, there are no ‘functional’ calls within MemHierarchy. For instance, if a L1 cache has access latency of 4 cycles, it takes 4 clock ticks (4 x 1/[Cache Frequency] ns) to send the request. The request gets send to the appropriate cache. If the receiving cache has more than one incoming events, then it serializes them by placing them in a queue. Nevertheless, the actual cache pipeline is not cycle-accurate as this would highly diminish performance.
Components that want to interface with MemHierarchy should use the MemHierarchyInterface class to send and receive write/read requests. Requests can containg flags to allow for uncached and atomic requests.
For inter-node cache coherency, Merlin and Directory Controller components have to be used. Merlin provides an interconnect router. Topologies such as crossbar, mesh, torus, and more are supported. One or multiple Directory Controllers can be instantiated, each taking ownership of the main memory component that is attached to it.
To see a full list of parameters run sstinfo.x or look at the file “sst/elements/memHierarchy/libmemHierarchy.cc”
In SST 4.0, MemHierarchy was design from scratch to make it more scalble, flexible, and be able to add new features, such as the MESI protocol. Due to this redesign, MemHierarchy is not backward-compatible with old configuration script files.
Here are some key aspects to be aware of in order to upgrade and/or create new configurations scripts:
Below is a list of all the required parameters in new MemHierarchy: