McNiagara Element Library

This is a description of the function and use of the McNiagara stochastic processor component.

Mcniagara Introduction

The McNiagara processor component is a single-core statistical model of the Sun Niagara 2 processor. Using the SST infrastructure, a multicore implementation can be realized. Because this is a statistical model, it is less accurate but much faster than a cycle-by-cycle model. This type of model is meant to be used for fast bottleneck analysis when investigating processor issues and as a driver for a larger simulation when looking at either system performance or detailed performance of some other component in the system. This model, like cycle-accurate simulators, is based on an existing architecture that can be modified either through a configuration file or on the command line to define a new architecture implemented by the same ISA.

The McNiagara model is currently only implemented as a stand-alone model, meaning that we are working on developing the interfaces to connect it to other SST components. Therefore, it can be executed within the SST framework as a model for performance analysis of an application at the processor level.

Details

Model Input

When using McNiagara as a performance evaluation tool, it takes application input from an execution trace (set traceF= 1) or it can internally generate a statistical trace based on the application’s instruction mix (traceF=0). The model requires additional input that can be collected from the dynamic execution of the application on a Niagara 2 system. We use the Shade binary instrumentation tool and the cputrack performance counter interface to collect this information. A future release of SST will include these data collection tools.

The application data collected from performance counters and used as input to the model includes:

Application data collected using n=binary instrumentation and used as model input includes:

The latencies and sizes of various microarchitecture components (e.g., memory hierarchy latencies) are defined in the McNiagara.h file.

Model Output

The model outputs a CPI prediction in addition to many other statistics relating to the utilization and stall cycles associated with the different microarchitectural components modeled. The model also decomposes CPI into components according to the stall cycles attributed to various microarchitectural component and characteristics such as data dependences. When plotted in a stacked bar chart, we call these cpi stacks. CPI stacks can be used in bottleneck analysis, to quickly pinpoint which component in the microarchitecture or which characteristic (e.g., data dependence) is causing significant performance degradation.

Model Use

Invoking the model binary with –help will show all options associated with running the model. An example of the output is shown below.

./mcniagara --help
Starting up
McNiagara: Constructor

Usage: mcniagara [options]
	Options:
	--seed # set random number seed (default: based on time())
	--ihist filename  use named file for histogram input file
		(default: INPUT)
	--iprob filename  use named file for instruction probabilities
		(default: inst_prob.h)
	--perf  filename  use named file for performance counter data
		(default: perf_cnt.h)
	--trace filename  use named file for trace-driven simulation
                (default: perform stochastic simulation)
	--outf  filename  use named file for output results
		(default: print to stdout)

If a trace file is specified, the model will be driven by an instruction trace; otherwise, model execution is driven by instruction mix statistics. The data collection tools will generate the required data in the appropriate file names and formats.

For more details on this model and the methodology on which it is based, see the reference below.

  1. W. Alkohlani, J. Cook, R. Srinivasan. Extending the Monte Carlo Processor Modeling Technique: Statistical Performance Models of the Niagara2 Processor, Proceedings of the 39th International Conference on Parallel Processing, 2010.