Scheduler Element Library

Implements 5 models in 2 components: scheduling, allocation, and machine models are implemented in the schedComponent; and node and failure models are implemented in the nodeComponent.

Scheduler Introduction

The scheduler component simulates the placement and scheduling of jobs onto the simulated system. This component is based on an existing standalone HPC simulator {5, 9} which allows exploration of when jobs run (scheduling) and to which processors they are assigned (allocation) {10}. Porting this simulator into the SST framework provides realistic workloads for low-level simulations and will benefit research on scheduling and allocation with low-level details on how jobs in a system interact.

The simulator parses a trace derived from the Parallel Workloads Archive {1} for characteristics of the jobs to run, and manages a stream of job arrivals and completions. It uses objects from the abstract classes Scheduler, Allocator, and Machine to store most of the simulation state and to make decisions with derived classes implementing different versions of the necessary functionality (See figure below). Common schedulers and allocators have been implemented for use as comparison baselines. These include EASY {4} (EASYScheduler), a generalization of Conservative Backfilling {8} (StatefulScheduler), and a priority-based scheduler (PQScheduler). The latter uses a set of comparators to select between FCFS, Shortest Job First, Widest Job First, and similar priority-based schemes. Most of the allocators are meant for mesh systems, with classes derived from Allocator implementing allocators based on linear orderings {6, 3, 10} (LinearAllocator), center-based allocators {7, 2} (NearestAllocator), and buddy systems {6, 10} (MBSAllocator). In addition, to speed up simulations where scheduling is the focus and allocation is unnecessary, the system provides a “bag of processors” with no notion of locality (SimpleMachine) along with the appropriate allocator (SimpleAllocator).

ClassDiagram

To integrate with the SST, a new class was added to represent the scheduling components and to interface with the other SST components. This interface includes using the XML to initialize the component and receiving job arrival events as SST messages. Here we provide a sample input deck.

```

run-mode=both partitioner=self <component name="s" type="scheduler.schedComponent" rank=0> NASA-iPSC-1993-2.1-cln.sim . . . </component> <component name="n0" type="scheduler.nodeComponent" rank=0> 0 </component> <component name="n1" type="scheduler.nodeComponent" rank=0> 1 </component> <component name="n2" type="scheduler.nodeComponent" rank=0> 2 </component> . . . <component name="n127" type="scheduler.nodeComponent" rank=0> 127 </component>

111

Details

Starting jobs and learning of their completion now involves interacting with node components, one for each node in the system. The Allocator classes notify the interfacing class, which sends the appropriate nodes an SST message to begin processing. As each node completes its part of the job,it sends an SST event message back to the interfacing class, which gathers these until all nodes assigned to the job have completed, at which point the rest of the scheduler is told of these nodes’ availability. Returning the nodes to availability all together mimics the behavior of real systems and, fortunately, minimize the changes necessary to the existing scheduler.

Future work will focus on the node component. Currently, the node simulates job duration as a fixed static interval, neglecting intra- and inter-job interactions. We are now starting to add communication behavior to the nodes, which will use networking components such as IRIS or !PhoenixSim. Eventually, processor and memory modules such as BOB-Sim, !MacSim, and gem5 will be added. This will allow exploration of topology aware allocation algorithms where network interference effects can be better understood.

  1. D. Feitelson. The parallel workloads archive. http://www.cs.huji.ac.il/labs/parallel/workload/index.html.

  2. S. Krumke, M. Marathe, H. Noltemeier, V. Radhakrishnan, S. Ravi, and D. Rosenkrantz. Compact location problems. //Theoretical Computer Science//, 181(2):379-404, 1997.

  3. V. Leung, E. Arkin, M. Bender, D. Bunde, J. Johnston, A. Lal, J. Mitchell, C. Phillips, and S. Seiden. Processor allocation on Cplant: Achieving general processor locality using one-dimensional allocation strategies. In //Proc. 4th IEEE Intern. Conf. on Cluster Computing//, pages 296-304, 2002.

  4. D. Lifka. The ANL/IBM SP scheduling system. In //Proc. 1st Workshop Job Scheduling Strategies for Parallel Processing//, number 949 in LNCS, pages 295-303, 1995.

  5. A. Lindsay, M. Galloway-Carson, C. Johnson, D. Bunde, V. Leung. Backfilling with guarantees granted upon job submission. In //Proc.// 17th Intern. Euro-Par Conf. Parallel Processing//, number 6852 in LNCS, pages 142-153, 2011.

  6. V. Lo, K. Windisch, W. Liu, and B. Nitzberg. Non-contiguous processor allocation algorithms for mesh-connected multicomputers. //IEEE Trans. Parallel and Distributed Systems//, 8(7):712-726, 1997.

  7. J. Mache, V. Lo, and K. Windisch. Minimizing message-passing contention in fragmentation-free processor allocation. In //Proc. 10th IASTED Intern. Conf. Parallel and Distributed Computing and Systems//, pages 120-124, 1997.

  8. A. W. Mu’alem and D. G. Feitelson. Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. //IEEE Trans. Parallel and Distributed Syst.//, 12(6):529-543, 2001.

  9. O. Thebe, D. Bunde, and V. Leung. Scheduling restartable jobs with short test runs. In //Proc. 14th Workshop Job Scheduling Strategies for Parallel Processing//, 2009.

  10. P. Walker, D. Bunde, and V. Leung. Faster high-quality processor allocation. In //Proc. 11th LCI Intern. Conf. High-Performanc Clustered Computing//, 2010.