Silicon Graphics Cray Origin2000 upgrade to the Climate Simulation
Laboratory
In December 1997, after a lengthy evaluation of domestic supercomputer
technology by SCD and select representatives from other NCAR divisions, it
was decided to invest the FY1998 Climate Simulation Laboratory (CSL)
computational budget in a 128-processor Silicon Graphics Cray
Origin2000. The system was
delivered to SCD on May 18, and in the subsequent weeks installed,
configured, and prepared for acceptance. The acceptance test period was
completed on June 24, 1998, and the system, named ute, was made available to
six CSL projects at that time.
The six CSL projects given allocations on the Silicon Graphics Cray
Origin2000 (ute) and their
relative allocations are shown in the following diagram. The bulk of the
allocation (86%) was given to two climate-modeling efforts within the
Climate and Global Dynamics Division of NCAR.
The Silicon Graphics Cray Origin2000 (ute) added in late June 1998 to the
CSL has the following specifications:
- 128 super-scalar CMOS 250 MHz MIPS RISC R10000 microprocessors; each
processor has a theoretical peak performance of 500 million floating point
operations per second (MFLOPs), thus the system has a theoretical peak
performance of 64 billion floating point operations per second
(GFLOPs). SCD expects applications representative of CSL to achieve
between 5-10% of this peak performance.
- 16 GB of physically distributed global shared memory
- 450 GB of high-speed, fibre-attached local disk storage
- Four HiPPI-800 channels; three are connected to the NCAR Mass Storage
System, one is available for a high-speed TCP/IP networking connection
- Two FDDI network connections
- Sixteen 100BaseT Ethernet connections
- Irix 6.5 operating system and product set
SCD worked hand-in-hand with Silicon Graphics Cray during the
assembly, installation, configuration, and testing of the
Origin2000. SCD had also tested various subsystems
prior to the system's shipment to NCAR. Many attributes of the system
software and its configuration for operation in the CSL environment at NCAR
were evaluated, including:
- Raw HiPPI driver functionality and connection with the NCAR Mass
Storage System was accomplished.
- Network Queuing Environment (NQE) was installed and configured on the
Origin2000 running the Irix operating system.
- Extensive evaluation of Irix's ability to checkpoint and restart
shared-memory and MPI-based applications was done.
- Extensive work was done to implement Irix accounting and charging.
- Extensive tuning of filesystem performance, kernel scheduling, and
process scheduling was done.
- It was determined that Irix support for resource management and
enforcement of user limits is lacking.
- A subset of the NCAR Batch Priority Scheduler (BPS) was developed for
the DSM systems: the Batch Dedicated Scheduler (BDS).
- The scrubber, User
Master File (UMF), and system administration and monitoring tools were
ported to Irix and the Origin2000.
- Operational procedures (shutdown, restart, process signaling, etc.)
were established.
At the end of FY1998, the Silicon Graphics Cray Origin2000 (ute)
is being used extensively.
System utilization has climbed from approximately 50% in July to exceed 80%
by the end of the fiscal year. This high level of utilization is exceptional
and unexpected for a DSM architecture. (Other centers, such as NCSA,
typically observe, and have expressed concerns that, such architectures cannot
be expected to exceed about 2/3 utilization.) Though utilization is good,
there are a large number of features and capabilities missing in the
Origin2000 and Irix environments which have been available for years in the
Cray/UNICOS environments.
Because of this, SCD expended during FY1998 (and will continue in the
upcoming years) significant resources to work with DSM vendors and their
software development organizations to establish more stability, versatility,
and administrative/resource controls in their software product offerings.
With these efforts, DSM systems will be more readily integrated into the
service-oriented, production supercomputing environment provided by SCD to
the NCAR user community and the CSL.