MemHierarchy Element Library

Overview

MemHierarchy is a highly configurable, flexible, OO-based memory hierarchy simulator that models intra- and inter-node directory-based cache coherency architecture with MSI and MESI protocols (MOESI in progress). This hierarchical directory-based coherency architecture is similar to the ones used in modern computing systems, such as Intel’s core i7 and ARM’s cache coherency interconnects (ARM CoreLink CCN Family). MemHierarchy’s goal is to accurately simulate these modern architectures and be flexible and configurable enough to allow the users to explore different techniques. To achieve higher accuracy, MemHierarchy’s uses actual interconnect models and actual cycle delays to simulate contention and latencies between the caches, directory controller, and memory.

MemHierarchy allows for any number of cache levels (unlike Gem5’s Ruby). Coherency is maintained regardless whether caches are private or shared, and/or the number of levels.

In contrast to functional models, MemHierarchy provides SST with a flexible infrastructure capable of accurately simulating a wide range of cache topologies and memory systems. SST is not restricted to 2 or 3-level caches, any number of intra-node caches is allowed as well inter-node directory controllers. However, as compared to functional models, MemHierarchy is relatively slower.

SST 4.0 Release provides a MemHierarchy component that was design from scratch to make it more scalable, flexible, and reliable. Snooping coherency was dropped in support for the more modern (scalable) and more often used directory-base intra-node coherency.

Structure

MemHierarchy is structured in a ‘protocol-centric’ approach as done in real hardware. Incoming requests are handled by the cache controller (cache.cc), which serves as the main central unit of the memory hierarchy subsystem. Depending on the type of request, and/or state of the cache, the cache controller ultimately redirects requests to all the other subcomponents of the system, such as the MSHR, Cache Array, the Top Coherence Controller (Top CC), and the Bottom Coherence Controller (Bottom CC).

The Top CC is in charge of keeping and maintaining directory information of the lower-level, or child, caches. This is done per-cache-line basis and it includes state, sharers, and an acknowledge count. It is also in charge of sending invalidate request when needed and provide responses to child caches. States for each cache line are: “V”, and “Inv_A”. “V” indicates that as far as the top controller is concerned the cache line is in stable state. On the other hand, “Inv_A” indicates that the top coherence controller sent an invalidate cache line request to a child cache and it is waiting for a response.

The Bottom CC maintains the data state of every cache line in the cache. It determines when it is necessary to send upgrades, writebacks, or simply forward requests to the higher-level, or parent, caches. Every cache line has one of the following states: I, IS, IM, S, SI, EI, SM, E, M, MI, MS. One-letter states imply a stable state (in the MESI protocol) while two-letter states imply that the cache line is in a transition state and it is waiting for an action or response event. During a transition state the cache line is ‘locked’ and other requests have to wait for the current request to complete.

The Miss Status Handle Register, or MSHR, is part of the cache controller and its main job is to keep track of all the pending requests of the subsystem. Requests are always added to the MSHR, and are only removed from it when the requests have been properly handled. When MemHierarchy receives a response to a previously sent request, the cache controller checks all pending requests in the MSHR. If there’s a MSHR Hit, the pending request is processed by the MSHR.

Timing

All requests are synchronized when they are received. By default, MemHierarchy handles only one request per cycle but multiple requests can be handled if the “tag_copies” input variable is specified. All responses are delayed by a configurable number of cycles specified by the “access_latency_cycles” input variable.

All requests and responses are sent through SST Links. This accurately models the contention of memory requests throughout the system. Similarly, all requests and responses have actual ‘latency’ associated with them. Therefore, there are no ‘functional’ calls within MemHierarchy. For instance, if a L1 cache has access latency of 4 cycles, it takes 4 clock ticks (4 x 1/[Cache Frequency] ns) to send the request. The request gets send to the appropriate cache. If the receiving cache has more than one incoming events, then it serializes them by placing them in a queue. Nevertheless, the actual cache pipeline is not cycle-accurate as this would highly diminish performance.

MemHierarchy Internal Commands

  1. GetX.- Get cache line in exclusive mode. Cache needs to invalidate other sharers/exclusive shares, if any. Data is written to cache line when exclusivity is acquired.
  2. GetS.- Get cache line in shared mode. If other caches have cache line in exclusive or modified state then it needs to be downgraded (to shared state) or invalidated.
  3. Inv.- Request to invalidate a particular cacheline
  4. InvX.- Request to downgrade a particular cacheline to shared state
  5. GetXResp.- Response to a GetX request.
  6. GetSResp.- Response to a GetS request.
  7. PutM.- Writeback request for a cache line in modified state. Modified data is written to the higher-level cache and the sender is removed as a ‘sharer’. Serves as an invalidate ack (InvAck) when PUTM is sent due to an invalidate request by a higher-level cache.
  8. PutS.- Writeback request for a cache line in shared state. This command lets the parent cache know that the child cache is no longer a ‘sharer’. Data does not need to be written by the high-level cache
  9. PutX.- Writeback request for a cache line in modified state. Unlike PutM, the higher level cache was only downgraded to S state. Therefore, upon receiving this command, the lower level cache needs to still keep track of the sharer.
  10. PutE.- Writeback request for a cache line in (E)xclusive state. The higher level cache or meory does not need to actually write the data since the data is in clean state.

MemHierarchy Interface

Components that want to interface with MemHierarchy should use the MemHierarchyInterface class to send and receive write/read requests. Requests can containg flags to allow for uncached and atomic requests.

Directory Controller

For inter-node cache coherency, Merlin and Directory Controller components have to be used. Merlin provides an interconnect router. Topologies such as crossbar, mesh, torus, and more are supported. One or multiple Directory Controllers can be instantiated, each taking ownership of the main memory component that is attached to it.

Configuration Parameters

To see a full list of parameters run sstinfo.x or look at the file “sst/elements/memHierarchy/libmemHierarchy.cc”

MemHierarchy Migration Guide (SST 3.0 to 4.0)

In SST 4.0, MemHierarchy was design from scratch to make it more scalble, flexible, and be able to add new features, such as the MESI protocol. Due to this redesign, MemHierarchy is not backward-compatible with old configuration script files.

Here are some key aspects to be aware of in order to upgrade and/or create new configurations scripts:

Below is a list of all the required parameters in new MemHierarchy:

Features in progress (as of 04-4-2013)