McOpteron Element Library

Description of the function and use of an Opteron stochastic processor model

Mcopteron Introduction

The McOpteron processor component is a single-core statistical model of the AMD Magny-Cours K10 architecture. Using the SST infrastructure, a multicore implementation can be realized. Because this is a statistical model, it is less accurate but much faster than a cycle-by-cycle model. This type of model is meant to be used for fast bottleneck analysis when investigating processor issues and as a driver for a larger simulation when looking at either system performance or detailed performance of some other component in the system. This model, like cycle-accurate simulators, is based on an existing architecture that can be modified either through a configuration file or on the command line to define a new architecture implemented by the same ISA.

The McOpteron model is currently only implemented as a stand-alone model, meaning that we are working on developing the interfaces to connect it to other SST components. Therefore, it can be executed within the SST framework as a model for performance analysis of an application at the processor level.

Details

Model Input

When using McOpteron as a performance evaluation tool, it takes application input from an execution trace (set traceF= 1) or it can internally generate a statistical trace based on the application’s instruction mix (traceF=0). The model requires additional input that can be collected from the dynamic execution of the application on a Niagara 2 system. We use the PIN binary instrumentation tool and the PAPI performance counter interface to collect this information. A future release of SST will include these data collection tools.

The application data collected from performance counters and dynamic binary instrumentation is placed in various files by the collection tools and used as input to the model. A list and description of all of the input files expected by the model is as follows:

Model Output

The model outputs a CPI prediction in addition to many other statistics relating to the utilization and stall cycles associated with the different microarchitectural components modeled. The model also decomposes CPI into components according to the stall cycles attributed to various microarchitectural component and characteristics such as data dependences. When plotted in a stacked bar chart, we call these cpi stacks. CPI stacks can be used in bottleneck analysis, to quickly pinpoint which component in the microarchitecture or which characteristic (e.g., data dependence) is causing significant performance degradation.

Model Use

Invoking the model binary with –help will show all options associated with running the model. An example of the output is shown below.

/mcopteron --help

Usage: mcopteron [options]
Options:
  --debug #        print lots of debugging information (# in 1-3)
  --cycles #       set number of cycles to simulate (default: 100K?)
  --converge       run until CPI converges (not default)
  --deffile name   use 'name' as insn def file (default: opteron-insn.txt)
  --dcycle #       start debugging output at cycle # (rather than 0)
  --simix          print out static input instruction mix at beginning
  --imix           print out simulation instruction mix at end
  --mixfile name   use 'name' as insn mix input file (default: usedist_new.all)
  --appdir name    use 'name' as application file directory (default: .)
  --seed #         set random number seed
  --trace name     use 'name' as input instruction trace file
  --traceout       generate a token trace to stderr
  --sepsize        keep instruction types separate on operand size
  --newimix name   use 'name' as an i-mix-only file (new format)(default: instrMix.txt)
  --isizefile name use 'name' as instruction size distribution file(default: instrSizeDistr.txt)
  --fsizefile name use 'name' as fetch size distribution file(default: fetchSizeDistr.txt)
  *note: --isizefile and --fsizefile are exclusive options; only one of them should be used
  --transfile name use 'name' as instr transition probability file for Markov-based token generator
                   (if this is not used, instr probabilities from instruction mix will be used)
  --defaults       use default file names and options for --newimix, --mixfile, and --deffile
  --repeattrace    use the input trace over and over

If a trace file is specified, the model will be driven by an instruction trace; otherwise, model execution is driven by instruction mix statistics. The data collection tools will generate the required data in the appropriate file names and formats.

For more details on this model and the methodology on which it is based, see the reference below.

  1. J. Cook, J. Cook, W. Alkohlani. A Statistical Performance Model of the Opteron Processor, ACM SIGMETRICS Performance Evaluation Review - Special Issue on the 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10), Volume 38, Number 4, March 2011