Last update: 05/30/2007
This section provides information about bluevista queues and charging. Charges depend on the queue in which your job runs. To select a queue, it is important to consider the charging formula.
To make informed decisions about queues and charging, you need a basic understanding of bluevista's system architecture. It is worthwhile to review the bluevista main page if you are not already familiar with the system.
Charges for jobs run on bluevista are assessed in General Accounting Units (GAUs).
Charging formula
This formula specifes how your computing account is charged for running jobs on bluevista:
GAUs charged = wallclock hours used * number of nodes used * number of processors in that node * computer factor * queue charging factorThe "computer factor" is a multiplier that equalizes the way GAUs are consumed on different computing platforms. Faster computers have higher computer factors. The computer factor for bluevista is 0.87.
The "queue charging factor" is a multiplier that reflects the priority given to jobs in a queue: higher-priority jobs are charged more.
Note: Users running jobs in the "share" queue are not required to reserve all the processors in a node, so the charging formula for that queue is:
GAUs charged = CPU hours used * computer factor * class charging factorNote: The charging formula described in the CISL Portal gives different names to these variables, and it does not make a distinction between dedicated-node charging and shared-node charging. The following table helps prevent confusion caused by the terminology used in the CISL Portal:
Charging formula CISL Portal terminology GAUs charged GAU calculation wallclock hours used wallclock hours number of nodes used *
number of processors in that nodeCPUs reserved Note: CPUs (processors) must be reserved in even multiples of the processors in a node unless your job runs in the share queue (see share queue formula above). computer factor system multiplier queue charging factor queue multiplier When you understand the different terminology used for the portal, you can see that both charging formulas are equivalent.
Exceeding allocation threshold limits*
Jobs from NCAR divisions or CSL proposal groups that have exceeded either the 30-day or 90-day usage limits* will be placed in the hold queue and run at a priority below jobs in the economy queue. Affected jobs will be charged at 1/3 the rate they would have been charged if they had been run in a regular queue ("rg").
Jobs from NCAR divisions or CSL proposal groups that have exceeded both the 30-day and 90-day usage limits* will be rejected, and users will receive an email suggesting that they submit their jobs to a standby queue. Note that standby queue time limits are three hours, so users may need to change their job's time limit before resubmitting to a standby queue.
* The bluevista user guide provides details about how jobs are scheduled for execution when an allocation threshold limit has been exceeded. See the section Allocation thresholds for projects influence job scheduling.
The queue names and uses are:
- The special queue is used for large, long-running jobs that must be approved by the CISL Director. To request use of this queue, send email to cal@ucar.edu
- The premium queue is for jobs that need to run before jobs in the lower queues. Node limit for bluevista premium jobs: premium jobs are limited to 70 nodes on bluevista, i.e. 560 processors. Be advised that the fewer nodes and the shorter wallclock limit, the faster the job turnaround. CISL Customer Support can provide guidance on how to decrease the number of nodes your job uses. They may be reached at 303-497-1278 or via a work request.
- The regular queue is for medium-priority jobs. The same node limit (70) as premium applies, along with the same caveat.
- The economy queue is for low-priority jobs. The node limit is 70, along with the same caveat as for premium jobs.
- The standby queue is for jobs that can be run only when idle nodes are available. This is a very low-priority queue that is designed to make otherwise unused cycles available to users. Therefore, turnaround time in the standby queue may be extremely long, especially for large-node jobs (maximum 70) with long wallclock times. Caution: If use of the standby queue increases to the point where it interferes with normal work, the standby jobs will be suspended and will remain in a "hold" state until unused cycles become available.
- The share queue is for jobs that are not compute-intensive or intensive on other resources such as memory. This queue runs on two nodes only, the same ones as the debug queue (see below). It has a three-hour time limit, with a maximum of 16 processors (2 nodes). If you know your jobs will keep the share queue fairly busy, we recommend that you run them outside the peak-demand hours of 08:00-17:00 MT weekdays. At all times, the number of jobs submitted to the share queue by the same user should be limited to 25. Examples of jobs appropriate to run in the share queue include:
- Short interactive or debugging jobs;
- Pre-staging data to(from) the MSS before(after) a simulation run;
- Post-staging data from the MSS after a simulation run;
- General script or application development or checkout;
- Jobs requiring only one processor; and
- General simulation runs that do not require large amounts of memory, I/O, or CPU time.
- The debug queue is provided for interactive debugging (for example TotalView). It runs on two nodes only, the same as the share queue. It has a 30-minute time limit, with a maximum of 16 processors (2 nodes).
- The hold queue is an automated queue for jobs from divisions or groups that have exceeded their allocation threshold limit. Jobs in the hold queue have priority only over jobs in the standby queue.
The queue structure for bluevista is:
Queue CPUs Maximum
wallclock
hoursMemory Queue
charging
factorAvailability Node
limitspecial 560 Unlimited 16 GB 1.0 By special permission 70 premium 560 6 hrs. 16 GB 1.5 Any time 70 regular 560 6 hrs. 16 GB 1.0 Any time 70 economy 560 6 hrs. 16 GB 0.5 Any time 70 standby 560 2 hrs. 16 GB 0.1 Any time 70 share* 16 3 hrs. 16 GB 1.0 Any time 2 debug* 16 0.5 hrs. 16 GB 1.0 Any time 2 hold** 560 6 hrs. 16 GB 0.33 Automated 70 * The share and debug queues are on two nodes distinct from the login nodes. ** Jobs in the hold queue are automatically moved to lower priority when allocation threshold limits are exceeded.
Jobs with large node counts relative to the queue node limit, or with wallclock specification close to the queue wallclock limit, may remain pending in the queue a long time (days) before starting.
Jobs that need more than 512 processors cannot run until after 6:00 pm.
More batch processors are usually available after-hours and on weekends i.e. outside of peak-demand hours.
You can get faster job turnaround by specifying fewer nodes and shorter wallclock limits.
To obtain queue information, type:
bqueues -l
while logged on to bluevista.To check the status of all users' jobs on bluevista, type:
bjobs -u all
while logged on to bluevista, or to see a summary of all running jobs, type:
lsfqTo see your GPFS disk quota, type:
/usr/local/bin/spquota or
/usr/local/bin/spinfo
while logged on to bluevista. Note that disk space is oversubscribed to maximize the amount that is used. Because of this, everyone cannot use all of their disk space simultaneously.
If you have questions about this document, please contact CISL Customer Support. You can also reach us by telephone 24 hours a day, seven days a week at 303-497-1278. Additional contact methods: consult1@ucar.edu and during business hours in NCAR Mesa Lab Suite 39.
© Copyright 2006-2007. University Corporation for Atmospheric Research (UCAR). All Rights Reserved.
Address of this page: http://www.scd.ucar.edu/computers/bluevista/queue.charge.html