Significant accomplishment highlights

SCD's two most significant accomplishments in FY2001 were 1) the procurement of ARCS, the Advanced Research Computing System which doubled the capacity of our largest computer, and 2) the balanced, substantive progress made by all of SCD's sections toward fulfilling the SCD mission.

The NCAR Advanced Research Computing System procurement

The successful ARCS procurement is SCD's top achievement for FY2001. The ARCS system will provide a phased introduction of new computational, storage, and communications technologies through the life of the contract. This will allow NCAR's Scientific Computing Division to maintain a stable, state-of-the-art production facility for the next three to five years.

ARCS - The NCAR Advanced Research Computing System

The initial delivery augments the existing blackforest system by more than doubling its computational capacity, from 0.9 to 2.0 peak TFLOPS, and provides a five-fold increase in disk storage capacity. A second delivery, in September 2002, will introduce IBM's next-generation processor (POWER4), node (Regatta), and switch (Colony) technologies, adding almost 5 peak TFLOPS, upgraded switch communications, and 21 TB of new disk storage. In the fall of 2003, the Colony switch will be replaced with IBM's next-generation Federation switch technology, which provides much lower latency and higher bandwidth than does the Colony switch.

If NCAR chooses to exercise the two-year contract extension option, in the fall of 2004 the system will be upgraded with an additional 4 peak TFLOPS and 32 TB of new disk storage.

IBM and SCD have agreed to work together to improve the user environment and user support services that will be provided to NCAR and CSL. This agreement covers many aspects of the ARCS, including on-site IBM applications specialists, training in advanced programming, performance analysis and tuning techniques, and a more efficient process for reporting, escalating, and resolving compiler and tools problems.

Additionally, the agreement with IBM will provide the opportunity for NCAR to participate in IBM's "Blue Light" HPC project. Blue Light is an exploratory effort of IBM's Exploratory Server Systems department at IBM Research to develop future PetaFLOPs supercomputer systems. NCAR's collaboration with IBM in Blue Light holds the promise of significant and revolutionary advancements in climate, weather, and Earth systems models, and will provide IBM with valuable input on hardware and software design.

Balanced, substantive progress made by all of SCD's sections toward fulfilling the SCD mission

SCD's internal organization is shaped by its mission. SCD's structure makes each of its management units primarily responsible for one line item in SCD's mission. While some line items in the mission are shared, the purpose of each management unit is to focus on and execute one specific part of the mission. This organization places accountability and credit for mission-critical projects on specific people in the division.

Our second highlight for FY2001 is the balanced progress we achieved in each fundamental area of our mission, which is to provide:

  1. High-performance computing and expertise needed for the development and execution of large, long-running numerical simulations
  2. A data archiving and management system that is balanced in performance and capacity relative to computational resources
  3. High-speed network and data communication capabilities that are balanced with respect to computational facilities, storage facilities, and the requirements of a national and international community
  4. Research datasets and expertise needed by atmospheric and related sciences
  5. A computing environment and support services that emphasize user productivity and cost-effectiveness
  6. Education and training in computing and related technologies with an emphasis on under-represented groups
  7. Transfer of appropriate NCAR technology to the private sector in collaboration with the UCAR Foundation

These are the stories of our most significant achievements in each of these areas, organized by the way SCD's management units relate to the line items in our mission:

High performance computing

Primary responsibility for high performance computing is managed by the Supercomputing Systems Group (SSG) to support part (a) of our mission. SSG FY2001 highlights include:

NCAR batch scheduler enhancements
SSG developed and maintains the batch scheduling software that sits on top of the resource management software provided by the vendors. The resource management software controls allocation of resources within each of the supercomputers. The batch scheduling software implements the NCAR business logic for how best to allocate the resources. Many enhancements were made to the scheduler this year, including enhanced error checking with feedback and externalization of the control logic.

When a scientist submits a job to the supercomputers, there are a number of directives that can be specified in the job script. Often times these directives are confusing and conflicting, which results in the job not properly running or not running at all. Changes have been made to the scheduler to provide enhanced filtering for incomplete, incorrect, or conflicting directives. When a job is submitted that has one of these problems, it is rejected and an email is sent to the scientist explaining the problem and recommending possible solutions.

The externalization of the control logic involved parameterizing things like maximum wallclock time for the queues, privilege granting for special queues, and special project authorizations. These items have been removed from user code and centralized into a configuration file. The business office now owns and controls this file, removing this system administration burden from users.

New end-user tools
This year, SSG developed a series of tools that enable scientists, consultants, and systems staff to easily obtain job status information from the supercomputers. These tools provide a global view of the job status from each of the supercomputers. They include things like the status of running jobs, queued jobs, completed jobs, load average of the system, outstanding requests for data from the Mass Storage System, and information about the inner logic of the scheduling algorithm. The output from these tools is in the form of a summary report, and the job status information is also being posted (every five minutes) to the SCD website. A follow-on activity that is currently underway is the conversion of this information into a graphical, point-and-click format that will enable a more timely delivery of this information from within a web browser.

Computational science research and development

Primary responsibility for computational science and computational research and development is managed by the Computational Science Section (CSS) to support part (a) of our mission. CSS FY2001 highlights include:

Terascale spectral element dynamical core for atmospheric General Circulation Models
Climate modeling is a grand challenge problem where scientific progress is measured not in terms of the largest problem that can be solved but by the highest achievable integration rate. Loft, Thomas, and Dennis have developed a scalable spectral element atmospheric model that achieves a high percentage of peak on microprocessors. A semi-implicit time-stepping scheme accelerates the integration rate relative to an explicit model by a factor of two.

SI SEAM performance comparison

The MPI implementation outperforms hybrid MPI/Open MP on the IBM SP. Simulation rates have been measured for the standard shallow water equation benchmarks using up to O (105) horizontal degrees of freedom. A sustained 370 GFLOPS was achieved at NERSC IBM. This work is a finalist for the annual Gordon Bell Award announced at the annual ACM Supercomputing Conference each year.

Earth System Modeling Framework
Over the last few years, the need for software infrastructure for Earth system modeling has grown increasingly apparent. Models and the computational platforms that they run on are becoming extremely complex, leading to excessive time and resources dedicated to solving computational rather than scientific problems. In September 2000, the NASA High Performance Computing and Communications (HPCC) Earth and Space Science (ESS) Project released a Cooperative Agreement Notice (CAN) entitled "Increasing Interoperability and Performance of Grand Challenge Applications in the Earth, Space, Life, and Microgravity Sciences." The NASA CAN calls for the development of an "Earth System Modeling Framework (ESMF)." In response to this NASA announcement, a collaboration led by Cecelia DeLuca submitted a coordinated set of three proposals to develop an Earth System Modeling Framework. The ESMF will allow diverse scientific groups to leverage common software to solve routine computational problems, and it will provide an interface specification so groups working at different institutions and in different disciplines can generate interoperable software components. NASA has selected the proposals from this collaboration for funding, and there will be a three-year effort initiated in FY2002.

Data archiving and management system

Primary responsibility for NCAR's Mass Storage System is managed by the Mass Storage System Group (MSSG) to support part (b) of our mission. MSSG FY2001 highlights include:

SD-3 (Redwood) tape migration
Between June and September 2001, the Mass Storage Systems Group (MSSG) undertook and completed the migration of approximately 75 TB of data from SD-3 (Redwood) media to newer media types. Although it held great promise when introduced in the mid-1990s, Redwood is now considered an "end-of-life" technology, and the vendor (StorageTek) has consequently imposed significant increases in maintenance costs for Redwood tape drives. The migration of this 75 TB of MSS data allowed the MSS group to decommission 10 of its Redwood drives. (The remaining two drives are used only to read secondary copies of MSS files in those rare occasions that the primary copy--stored on non-Redwood media--cannot be read.) The decommissioning of these 10 Redwood drives represents a potential savings of approximately $16,000 per month in maintenance costs. In addition, as part of this project, key users were lobbied to remove as many Redwood-resident MSS files as they could: this resulted in the removal of an estimated 9 TB of unneeded data from the MSS.

9940 media deployment
As part of its ongoing commitment to make the best use of new tape technology in the Mass Storage System, MSSG also deployed 9940 media in FY2001. 9940 is a high-capacity media that uses the same recording technology as 9840 media, which has been in production in the MSS since 1999 and has proven itself to be extremely reliable. The new media has a capacity of 60 GB per cartridge. Higher capacities using the same 60-GB cartridges are planned for the next 12-18 months. In the first 16 weeks of production, over 150 TB of data were stored on 9940 media.

University of Illinois NCDM collaboration
Together with Lawrence Buja of CGD, MSSG set up and hosted a "Data Space" server cluster for the National Center for Data Mining (NCDM). NCDM is part of the Laboratory for Advanced Computing at the University of Illinois at Chicago (UIC). The server cluster consists of three Linux systems providing access to climate model data produced by NCAR via the NCDM's Data Space transfer protocol. The system was showcased at SC2000, demonstrating real-time data access from the Dallas show floor to the server housed in the SCD computer room.

Network engineering and telecommunications

Primary responsibility for developing and maintaining UCAR's networking infrastructure is managed by the Network Engineering and Telecommunications Section (NETS) to support part (c) of our mission. NETS FY2001 highlights include:

Front Range GigaPOP (FRGP)
The Front Range GigaPOP (FRGP) is a consortium of universities, nonprofit corporations, and government agencies that are cooperating in a regional network aggregation point called the FRGP to share the costs of Wide Area Networking (WAN) services. The current FRGP partners are the Boulder Point of Presence (BPOP), Colorado State University (CSU), CU-Boulder, CU-Denver, CU-HSC, CU-CS, CSM, DU, the University of Wyoming, and Fort Lewis College. Additional partners, including the State of Colorado and the University of Northern Colorado, are likely to join soon. There are similar gigapops throughout the U.S. There are a number of advantages gained by sharing services through such a gigapop. Costs for WAN services are reduced for each partner, expertise among partners can be shared, a higher level of services can be purchased than individual institutions could afford, there is more buying power among a consortium, and there are great economies of scale.

Front Range Gigapop members
(Click image for detailed view.)

NCAR/UCAR has provided the engineering and NOC support for the FRGP, with the service costs incurred by NCAR/UCAR being shared by all members. NETS believes that the greater service and bandwidth obtained through the FRGP are important enough for NCAR/UCAR to participate and provide the engineering and NOC services. FRGP has agreed that NETS has the most qualified engineering and NOC staff to provide the very best engineering and NOC services for the FRGP.

This is a critical service for the UCAR/NCAR staff as well as all the other partners, and it has proved to be an extremely successful technical project and an excellent collaboration with the Colorado research community. The FRGP provides NCAR/UCAR's primary WAN connectivity including Abilene connectivity. For more information, see http://www.frgp.net/

Web100
Web100 is a major project. The Web100 project is an initiative proposed by NCAR, PSC, and NCSA to fix some well-known problems with the Unix (and other) operating systems that are currently inhibiting effective utilization of national high-performance networks such as vBNS and Abilene. One of the biggest problems is the current need to manually calculate the optimal bandwidth delay product to specify a TCP window size that is large enough to avoid prematurely halting data transmission between TCP acknowledgment packets.

This issue generally isn't important for LANs, but it is important for high-performance WANs. It is difficult to determine the "bandwidth" part of the product, and right now the only effective way to obtain this is to have knowledge of the network topology, which usually means consulting with a network engineer. Furthermore, most applications don't provide a means for the user to specify this information even if it was available. The Web100 Project is seeking to solve this problem and some other related ones and has received funding from the NSF for a three-year research proposal.

For more information on Web100, see http://www.web100.org/

Research data stewardship

Primary responsibility for collecting, correcting, and distributing valuable research datasets is managed by the Data Support Section (DSS) to support part (d) of our mission. DSS FY2001 highlights include:

Upgraded data and data presentation on the Research Data web server
We implemented a large-scale upgrade and improvement to the information interface for the SCD Research Data Archive website.

The new interface covers all aspects of the Data Support Section web presence. It is not only superior for the data users, but it is also easy and efficient for DSS staff to use and maintain. This information system has more than 2,500 html-formatted pages that are updated automatically. The data available continue to grow as data and metadata are routinely added by DSS staff. The new DSS web interface significantly improves researchers' ability to access and use NCAR's Research Data Archive.

Services page from new DSS website
(Click image for detailed view.)

New additions to COADS and other ocean datasets
In 2001, a major milestone for COADS was achieved. New early data sources had been recovered through data archaeology efforts around the world. These sources and other new digital sources have been added to the collection for the period prior to 1950. This update and updates from previous years now form the complete replacement and extension for Release 1 COADS (1985). The new archive now covers 1784-1997. The COADS project, a collaborative effort between NOAA/CDC, NOAA/NCDC, and NCAR/SCD, is the world's dataset for describing conditions at the surface of the ocean (air temperature, wind direction and velocity, water temperature, etc.). COADS is a critical resource for studies of climate trends and global weather interactions, as well as for data reanalysis projects.

Visualization and enabling technologies

Primary responsibility for helping researchers visualize, interact with, and understand complex geophysical data is managed by the Visualization and Enabling Technologies Section (VETS) to support part (e) of our mission. VETS FY2001 highlights include:

NCAR's new Visualization Lab
Terascale visualization, collaboration, and the AccessGrid
Internet and web technologies coupled with high-bandwidth networks have served as the substrate for wonderful new opportunities in scientific endeavor and collaboration. While the desktop is still the day-to-day environment of choice for the individual, group meetings are more important than ever. From research organizations to businesses to universities, there is an enhanced focus on sophisticated, technology-mediated meeting spaces that facilitate information flow and enable virtual encounters.

SCD has recently completed the development of its new Visualization Lab, a physical facility that blends visual supercomputing, virtual reality, large-screen tiled display, and advanced collaboration technology. Backed by an array of large-scale computational and storage resources, the lab facilitates group exploration of terascale scientific data.

New Visualization Lab

Building upon the AccessGrid, a human-scale group-to-group collaboration environment, it also opens up opportunities for group participation in presentations, symposia, and workshops as well as collaborative research.

The community data portal
Sustainable strategies for enabling both providers and consumers of earth system data
Scientific data are at the heart of most of our research activities, and we need to share these data among ourselves and with a geographically distributed community.

Working with divisions and programs across UCAR and NCAR, SCD has initiated a forward-looking pilot project called the Community Data Portal (CDP). The CDP is targeted directly at elevating our organization's collective ability to function as a data provider with a coherent web-based presence.

CDP user interface

During FY2001, several pilot sub-projects were undertaken, including the ACACIA ARCAS system (ACACIA), Reanalysis-2 data (SCD/DSS), CCM diagnostic tools (CGD), TIME-GCM data (HAO), vegetation/ecosystem data (VEMAP), and distributed climate data analysis (COLA). Our efforts here have been extremely well received and have now grown into the role of a formal NCAR Strategic Initiative.

The Earth System Grid
In 1999, SCD joined with several DOE labs in a DOE-sponsored research project called The Earth System Grid (ESG). This effort was aimed at developing Grid-based technologies that facilitated management and high-speed access to large-scale distributed climate model data. During FY2001, we put some of the ESG technology into production operation for the PCM project, and it now supports a sustained transfer of data from NCAR to NERSC at data rates much higher than previously possible. We joined again this year with several collaborators to submit a new proposal for The Earth System Grid II. Working with Argonne National Laboratory, Lawrence Livermore National Laboratory, the University of Southern California, Oak Ridge National Laboratory, and Lawrence Berkeley National Laboratory, we successfully secured a new research contract to develop and deploy an operational ESG in support of terascale/petascale climate research. The project is a significant opportunity to advance research and computation, and it has already drawn substantial interest that extends into the international community.

Community data analysis and visualization software
Prior to this year, SCD has distributed NCAR Graphics and NCL on a cost-recovery basis. In FY2000, SCD management made the decision to move to an Open Source distribution model for NCAR Graphics and a "free availability" model for NCL (with Open Source planned for the future), and we began the process of implementing the decision. We actually implemented the new distribution modes this year, and since October 2000, there have been roughly 9,000 downloads of NCAR Graphics and 1,000 downloads of NCL. While it represented a significant loss of revenue for SCD, this move was extremely popular with our community and has brought many new users on board. We also continued to work with the new Weather and Research Forecast (WRF) model team to extend NCL's usefulness for the WRF community. New features included enhanced support for the increasingly popular HDF-5 format and completion of a significant portion of the development cycle for a new high-resolution map database for NCL. In a similar vein, we began the process of integrating our enhanced version of the popular Vis5D software into a new Open Source framework for community sharing, development, and usage. Complementing all of this was a substantial amount of experimental work with the scripting language Python and the development of new visualization software for educational uses under the auspices of our Visual Geophysical Exploration Environment (VGEE).

A new architecture for terascale data access, analysis, and visualization
Late in the year, SCD management convened a team tasked with re-examining the resources we provide to deal with data. This included web-based data access, future efforts in data portals, research projects in distributed data, data visualization, and post-computation processing and analysis.

The strategic planning process that ensued led to the definition of a new architecture that integrates a number of functions and systems, and moves various testbed efforts (SANs, MSS Proxy) into a new production phase. The new architecture is built on the concept of large shared data objects (1 TB or more) and speaks directly to user productivity, efficient use of computational and storage resources, and the support of new efforts in data analysis and visualization.

New visualization architecture

This new effort positions SCD as a leader in providing a powerful, balanced, and--most of all--productive environment. Acquisition of hardware and software began this year, while integration and deployment will happen in FY2002.

Assistance and support for NCAR's research community

Primary responsibility for helping researchers efficiently produce valid simulations on NCAR supercomputers is managed by the User Support Section (USS) to support part (e) of our mission. USS also supports researchers and SCD staff by providing supercomputer usage statistics and web publication services, and by supporting onsite servers, workstations, and application software at the NCAR Mesa Lab. USS FY2001 highlights include:

Extensive model code conversions facilitated by the SCD Consultants
Many researchers using NCAR supercomputers run models that were designed for the parallel vector processing systems such as the Cray Research X-MP, Y-MP, C90, and J90 systems. As the last two Cray systems at NCAR are being decommissioned in FY2002, the code that simulates atmospheric and related physics and chemistry on these vector systems has to be converted to run on one of the newer symmetric multiprocessor architectures.

The most promising of these platforms is the IBM SP system. In the past five years, the SCD Consulting Office guided the programmers and scientists on more than 19 major community models in converting from the old architecture to the new. This radical difference in architectures requires a huge investment in programming time and effort, and with a staff of five software engineers, the Technical Consulting Group has successfully handled a shift from straightforward usage questions to more complex development and design questions in the past year.

In addition, the consultants developed extensive user documentation and organized numerous training classes and workshops to help researchers work productively on the IBM SP systems blackforest and babyblue. As of September 2001, 65% of researchers using SCD computers were performing more than 90% of their computational work on the IBM SP (blackforest) or the SGI Origin 2000 (ute) rather than the Cray computers.

Enabling infrastruture

Primary responsibility for maintaining and operating NCAR's supercomputing environment is managed by the Operations and Infrastructure Support Section (OIS) to support part (e) of our mission. OIS FY2001 highlights include:

Infrastructure upgrades to support ARCS
In November 2000, OIS began to identify and upgrade those portions of the computing center infrastructure that would need to be augmented to support the new ARCS equipment. This specification of the needed infrastructure equipment was a moving target, as the possible machine configurations were in constant flux. After significant analysis, SCD identified the upper boundary of possible equipment to be delivered. OIS then determined that the existing air conditioning was sufficient, but the power distribution was not. OIS then began specifying, procuring, and installing all the necessary equipment so the facility could be ready for the ARCS delivery.

After nine months of constant work and some overtime, the power distribution system passed the startup inspection and went online the first of October, just in time for the delivery of the first wave of ARCS equipment. The project was estimated to cost slightly more than $600,000, and it came in slightly over $500,000.

Power distribution equipment

These upgrades position the electrical distribution system to support the next three years of the ARCS contract, and they form the cornerstone of a solid infrastructure for SCD to continue providing reliable, production-oriented services and equipment as tools for science.

SCD portal genesis
The Applications Group within OIS has started on an ambitious multi-year project to architect, implement, and deploy an application portal within SCD. This portal will provide web-based access to SCD's suite of resources and services. In addition, the portal is designed to be extensible and customizable so researchers can better manage the flow of information needed for their area of research. The project, while in its early stages, accomplished a great deal in the past year. The portal has moved from strictly a vision to early proof-of-concept systems. Several decisions were made, including identifying key technologies and the development approach. Currently, a preliminary job-submission utility is working, as well as an interface to some portions of the Mass Storage System. In the coming year, these early web-enabled services will be provided to researchers to solicit feedback and evolve the portal.


SCD FY2001 ASR table of contents































go to SCD ASR table of contents go to SCD FY2001 highlights go to SCD publications go to SCD educational activities go to SCD community service