|
|
|
|
|
|

SCD has developed and implemented a layer of software named BPS (Batch Priority Scheduler) to provide enhanced job scheduling capabilities that have not been available in vendor-released NQS systems. BPS provides a number of features that include priority-based pre-emptive job load management, time-based job queue scheduling, and the ability to schedule jobs in a round-robin fashion within the priority-ordered queues. Round-robin scheduling uses NCAR-defined projects, or proposals, to fairly distribute limited computational resources. BPS also gives SCD the ability to respond to special scheduling needs; for example, accommodating jobs with unusually large resource requirements that might not be able to run effectively in a normal job mix.
The Unified Queuing System project unifies the batch processing user interface across all of the computational platforms at NCAR with a common set of queues and job submission rules for all users. This project uses the NQE batch subsystem, front-ended by the SCD-developed BPS system, to meet the unique, demanding, and evolving computational requirements of the NCAR user community.
An example of this new queue structure, that for the Cray J90se systems (ouray and chipeta), is provided in the following table. On these systems, users submit jobs to one of the four production job submission queues, NQS then categorizes the job into one of eight subqueues based on priority and requested memory. Thus a policy of, for example, giving preference to large-memory jobs during nighttime hours can be easily implemented through BPS and NQS.
In addition to the four basic job queues, the Unified Queuing System also recognizes the need for occasional special projects. All systems have a "spec" (special) queue and an "nd" (near-dedicated) queue. The ability of a user to submit work to these queues is enabled and disabled based on SCD policy decisions and special, refereed requirements. Limits for these special queues are relatively unrestrictive, thus allowing a single application access to the full resources of a given system.
Queue structure for the Cray J90se systems (ouray and chipeta)
User Job Memory
Submission NQS Limit
Queue Subqueue (MW) CPU Limit (sec)
spec** spec_ 1000 864000 (10 hours * 24 CPUs)
nd** nd_ 1000 864000 (10 hours * 24 CPUs)
prem pr_ 64 864000 (10 hours * 24 CPUs)
pr_Mx 275 864000 (10 hours * 24 CPUs)
reg rg_ 64 864000 (10 hours * 24 CPUs)
rg_Mx 275 864000 (10 hours * 24 CPUs)
econ ec_ 64 864000 (10 hours * 24 CPUs)
ec_Mx 275 864000 (10 hours * 24 CPUs)
sb sb_ 64 864000 (10 hours * 24 CPUs)
sb_Mx 275 864000 (10 hours * 24 CPUs)
-----------------------------------------------------------------
** Available for special projects
-----------------------------------------------------------------
Due to the major differences in how the DSMs need to be scheduled compared with the PVPs, major algorithms in BPS had to be rewritten. SCD developed a new queue structure that better suits the DSM systems, and a new scheduler called Batch Dedicated Scheduler (BDS) was implemented.
DSM queue structures have been set up to allocate groups of processors to jobs, as shown in the following table, and job scheduling algorithms in BDS have been written to throttle the number of jobs so that the total number of executable threads does not exceed the total number of physical processors on the machine. This latter is referred to as reducing or eliminating processor "oversubscription." Testing and evaluation of existing DSM systems has demonstrated a remarkable degradation in individual application performance when processors are oversubscribed.
Queue structure for the Silicon Graphics Cray Origin2000 (ute)
Processor
User Job count Memory
Submission NQS Limit Limit
Queue Subqueue (#) (MW) CPU Limit (sec)
spec** spec_ 128 none none
share_16 shar_16_ 16 none 346,000 (6 hours * 16 CPUs)
ded_16 ded_16_ 16 none 346,000 (6 hours * 16 CPUs)
ded_32 ded_32_ 32 none 691,200 (6 hours * 32 CPUs)
ded_64 ded_64_ 64 none 1,382,400 (6 hours * 64 CPUs)
res_64** res_64_ 64 none 1,382,400 (6 hours * 64 CPUs)
-----------------------------------------------------------------------
** Available for special projects
-----------------------------------------------------------------------
BDS was implemented to reduce processor oversubscription while attempting to keep the system highly utilized and maximize system throughput. Because of competing requirements, BDS has been enhanced to allow a flexible, priority-based process for selecting which jobs to run. In addition, it has maintained the round-robin-by-proposal algorithm for selecting those jobs from the input queue.
At the end of FY1998, BDS was running on the Silicon Graphics Cray Origin2000 (ute) in production, and within the limited capabilities provided by the host Irix operating system, providing a mechanism of controlling the job load on ute to allow jobs near-dedicated access to system resources and enforcing a 'fair-sharing' of the system among the six projects given allocations.
|
|
|
|
|
|