 |
Significant accomplishment highlights
SCD's two most significant accomplishments in FY2001 were 1) the
procurement of ARCS, the Advanced Research Computing System which doubled the
capacity of our largest computer, and 2) the balanced, substantive progress
made by all of SCD's sections toward fulfilling the SCD mission.
The successful ARCS procurement is SCD's top achievement for FY2001.
The ARCS system will provide a phased introduction of new computational,
storage, and communications technologies through the life of the contract.
This will allow NCAR's Scientific Computing Division to maintain a stable,
state-of-the-art production facility for the next three to five years.

The initial delivery augments the existing blackforest system by more
than doubling its computational capacity, from 0.9 to 2.0 peak TFLOPS,
and provides a five-fold increase in disk storage capacity. A second
delivery, in September 2002, will introduce IBM's next-generation
processor (POWER4), node (Regatta), and switch (Colony) technologies,
adding almost 5 peak TFLOPS, upgraded switch communications, and 21 TB of new
disk storage. In the fall of 2003, the Colony switch will be replaced
with IBM's next-generation Federation switch technology, which provides
much lower latency and higher bandwidth than does the Colony switch.
If NCAR chooses to exercise the two-year contract extension option,
in the fall of 2004 the system will be upgraded with an additional
4 peak TFLOPS and 32 TB of new disk storage.
IBM and SCD have agreed to work together to improve the user
environment and user support services that will be provided to NCAR and
CSL. This agreement covers many aspects of the ARCS, including on-site
IBM applications specialists, training in advanced programming,
performance analysis and tuning techniques, and a more efficient process
for reporting, escalating, and resolving compiler and tools problems.
Additionally, the agreement with IBM will provide the opportunity for NCAR
to participate in IBM's "Blue Light" HPC project. Blue Light is an exploratory
effort of IBM's Exploratory Server Systems department at IBM Research to develop
future PetaFLOPs supercomputer systems. NCAR's collaboration with IBM in Blue
Light holds the promise of significant and revolutionary advancements in
climate, weather, and Earth systems models, and will provide IBM with
valuable input on hardware and software design.
SCD's internal organization is shaped by its mission. SCD's structure makes
each of its management units primarily responsible for one line item in SCD's
mission. While some line items in the mission are shared, the purpose of each
management unit is to focus on and execute one specific part of the mission.
This organization places accountability and credit for mission-critical
projects on specific people in the division.
Our second highlight for FY2001 is the balanced progress we achieved in
each fundamental area of our mission, which is to provide:
- High-performance computing and expertise needed for the development and
execution of large, long-running numerical simulations
- A data archiving and management system that is balanced in performance and
capacity relative to computational resources
- High-speed network and data communication capabilities that are balanced
with respect to computational facilities, storage facilities, and the
requirements of a national and international community
- Research datasets and expertise needed by atmospheric and related
sciences
- A computing environment and support services that emphasize user
productivity and cost-effectiveness
- Education and training in computing and related technologies with an
emphasis on under-represented groups
- Transfer of appropriate NCAR technology to the private sector in
collaboration with the UCAR Foundation
These are the stories of our most significant achievements in each of these
areas, organized by the way SCD's management units relate to the line items
in our mission:
Primary responsibility for high performance computing is managed by the
Supercomputing Systems Group (SSG) to support part (a) of our mission. SSG
FY2001 highlights include:
NCAR batch scheduler enhancements
SSG developed and maintains the batch scheduling software that sits on top
of the resource management software provided by the vendors. The resource
management software controls allocation of resources within each of the supercomputers.
The batch scheduling software implements the NCAR business logic for how best
to allocate the resources. Many enhancements were made to the scheduler
this year, including enhanced error checking with feedback and externalization
of the control logic.
When a scientist submits a job to the supercomputers, there are a
number of directives that can be specified in the job script. Often
times these directives are confusing and conflicting, which results in
the job not properly running or not running at all. Changes have been
made to the scheduler to provide enhanced filtering for incomplete,
incorrect, or conflicting directives. When a job is submitted that has
one of these problems, it is rejected and an email is sent to the
scientist explaining the problem and recommending possible solutions.
The externalization of the control logic involved parameterizing
things like maximum wallclock time for the queues, privilege granting
for special queues, and special project authorizations. These items
have been removed from user code and centralized into a configuration
file. The business office now owns and controls this file, removing this
system administration burden from users.
New end-user tools
This year, SSG developed a series of tools that enable scientists,
consultants, and systems staff to easily obtain job status information
from the supercomputers. These tools provide a global view of the job status
from each of the supercomputers. They include things like the status of running
jobs, queued jobs, completed jobs, load average of the system, outstanding
requests for data from the Mass Storage System, and information about the
inner logic of the scheduling algorithm. The output from these tools is in
the form of a summary report, and the job status information is also being
posted (every five minutes) to the SCD website. A follow-on activity that
is currently underway is the conversion of this information into a graphical,
point-and-click format that will enable a more timely delivery of this
information from within a web browser.
Primary responsibility for computational science and computational research
and development is managed by the Computational Science Section (CSS) to
support part (a) of our mission. CSS FY2001 highlights include:
Terascale spectral element dynamical core for atmospheric
General Circulation Models
Climate modeling is a grand challenge problem where scientific progress is
measured not in terms of the largest problem that can be solved but by the
highest achievable integration rate. Loft, Thomas, and Dennis have developed
a scalable spectral element atmospheric model that achieves a high percentage
of peak on microprocessors. A semi-implicit time-stepping scheme accelerates
the integration rate relative to an explicit model by a factor of two.

The MPI implementation outperforms hybrid MPI/Open MP on the IBM SP.
Simulation rates have been measured for the standard shallow water equation
benchmarks using up to O (105) horizontal degrees of freedom.
A sustained 370 GFLOPS was achieved at NERSC IBM. This work is a finalist
for the annual Gordon Bell Award announced at the annual ACM Supercomputing
Conference each year.
Earth System Modeling Framework
Over the last few years, the need for software infrastructure for Earth system
modeling has grown increasingly apparent. Models and the computational platforms
that they run on are becoming extremely complex, leading to excessive time
and resources dedicated to solving computational rather than scientific problems.
In September 2000, the NASA High Performance Computing and Communications
(HPCC) Earth and Space Science (ESS) Project released a Cooperative Agreement
Notice (CAN) entitled "Increasing Interoperability and Performance of Grand
Challenge Applications in the Earth, Space, Life, and Microgravity Sciences."
The NASA CAN calls for the development of an "Earth System Modeling Framework
(ESMF)." In response to this NASA announcement, a collaboration led by Cecelia
DeLuca submitted a coordinated set of three proposals to develop an Earth
System Modeling Framework. The ESMF will allow diverse scientific groups to
leverage common software to solve routine computational problems, and it will
provide an interface specification so groups working at different institutions
and in different disciplines can generate interoperable software components.
NASA has selected the proposals from this collaboration for funding, and there
will be a three-year effort initiated in FY2002.
Primary responsibility for NCAR's Mass Storage System is managed by the
Mass Storage System Group (MSSG) to support part (b) of our mission. MSSG
FY2001 highlights include:
SD-3 (Redwood) tape migration
Between June and September 2001, the Mass Storage Systems Group
(MSSG) undertook and completed the migration of approximately 75
TB of data from SD-3 (Redwood) media to newer media types. Although
it held great promise when introduced in the mid-1990s, Redwood is
now considered an "end-of-life" technology, and the vendor
(StorageTek) has consequently imposed significant increases in
maintenance costs for Redwood tape drives. The migration of this 75 TB
of MSS data allowed the MSS group to decommission 10 of its Redwood
drives. (The remaining two drives are used only to read secondary copies
of MSS files in those rare occasions that the primary copy--stored on
non-Redwood media--cannot be read.) The decommissioning of these 10
Redwood drives represents a potential savings of approximately $16,000
per month in maintenance costs. In addition, as part of this project,
key users were lobbied to remove as many Redwood-resident MSS files as
they could: this resulted in the removal of an estimated 9 TB of
unneeded data from the MSS.
9940 media deployment
As part of its ongoing commitment to make the best use of new tape
technology in the Mass Storage System, MSSG also deployed 9940 media in
FY2001. 9940 is a high-capacity media that uses the same recording
technology as 9840 media, which has been in production in the MSS
since 1999 and has proven itself to be extremely reliable. The new
media has a capacity of 60 GB per cartridge. Higher capacities
using the same 60-GB cartridges are planned for the next 12-18
months. In the first 16 weeks of production, over 150 TB of data
were stored on 9940 media.
University of Illinois NCDM collaboration
Together with Lawrence Buja of CGD, MSSG set up and hosted a "Data
Space" server cluster for the National Center for Data Mining (NCDM).
NCDM is part of the Laboratory for Advanced Computing at the University
of Illinois at Chicago (UIC). The server cluster consists of three
Linux systems providing access to climate model data produced by NCAR
via the NCDM's Data Space transfer protocol. The system was showcased
at SC2000, demonstrating real-time data access from the Dallas show
floor to the server housed in the SCD computer room.
Primary responsibility for developing and maintaining UCAR's networking
infrastructure is managed by the Network Engineering and Telecommunications
Section (NETS) to support part (c) of our mission. NETS FY2001 highlights
include:
Front Range GigaPOP (FRGP)
The Front Range GigaPOP (FRGP) is a consortium of universities, nonprofit
corporations, and government agencies that are cooperating in a regional
network aggregation point called the FRGP to share the costs of Wide Area
Networking (WAN) services. The current FRGP partners are the Boulder Point
of Presence (BPOP), Colorado State University (CSU), CU-Boulder, CU-Denver,
CU-HSC, CU-CS, CSM, DU, the University of Wyoming, and Fort Lewis College.
Additional partners, including the State of Colorado and the University
of Northern Colorado, are likely to join soon. There are similar gigapops
throughout the U.S. There are a number of advantages gained by sharing services
through such a gigapop. Costs for WAN services are reduced for each partner,
expertise among partners can be shared, a higher level of services can be
purchased than individual institutions could afford, there is more buying
power among a consortium, and there are great economies of scale.

(Click image for detailed view.)
NCAR/UCAR has provided the engineering and NOC support for the FRGP, with
the service costs incurred by NCAR/UCAR being shared by all members. NETS
believes that the greater service and bandwidth obtained through the FRGP
are important enough for NCAR/UCAR to participate and provide the engineering
and NOC services. FRGP has agreed that NETS has the most qualified engineering
and NOC staff to provide the very best engineering and NOC services for the
FRGP.
This is a critical service for the UCAR/NCAR staff as well as all the other
partners, and it has proved to be an extremely successful technical project
and an excellent collaboration with the Colorado research community. The
FRGP provides NCAR/UCAR's primary WAN connectivity including Abilene
connectivity. For more information, see
http://www.frgp.net/
Web100
Web100 is a major project. The Web100 project is an initiative proposed
by NCAR, PSC, and NCSA to fix some well-known problems with the Unix (and
other) operating systems that are currently inhibiting effective utilization
of national high-performance networks such as vBNS and Abilene. One of the
biggest problems is the current need to manually calculate the optimal bandwidth
delay product to specify a TCP window size that is large enough to avoid prematurely
halting data transmission between TCP acknowledgment packets.
This issue generally isn't important for LANs, but it is important for high-performance
WANs. It is difficult to determine the "bandwidth" part of the product, and
right now the only effective way to obtain this is to have knowledge of the
network topology, which usually means consulting with a network engineer.
Furthermore, most applications don't provide a means for the user to specify
this information even if it was available. The Web100 Project is seeking to
solve this problem and some other related ones and has received funding from
the NSF for a three-year research proposal.
For more information on Web100, see http://www.web100.org/
Primary responsibility for collecting, correcting, and distributing valuable
research datasets is managed by the Data Support Section (DSS) to support
part (d) of our mission. DSS FY2001 highlights include:
Upgraded data and data presentation on the Research Data
web server
We implemented a large-scale upgrade and improvement to the information
interface for the SCD Research Data Archive website.
The new interface covers all
aspects of the Data Support Section web presence. It is not only superior
for the data users, but it is also easy and efficient for DSS staff to use
and maintain. This information system has more than 2,500 html-formatted
pages that are updated automatically. The data available continue to grow
as data and metadata are routinely added by DSS staff. The new DSS web
interface significantly improves researchers' ability to access and use
NCAR's Research Data Archive. |

(Click image for detailed view.) |
New additions to COADS and other ocean datasets
In 2001, a major milestone for COADS was achieved. New early data sources
had been recovered through data archaeology efforts around the world. These
sources and other new digital sources have been added to the collection
for the period prior to 1950. This update and updates from previous years
now form the complete replacement and extension for Release 1 COADS (1985).
The new archive now covers 1784-1997. The COADS project, a collaborative
effort between NOAA/CDC, NOAA/NCDC, and NCAR/SCD, is the world's
dataset for describing conditions at the surface of the ocean (air temperature,
wind direction and velocity, water temperature, etc.). COADS is a critical
resource for studies of climate trends and global weather interactions, as
well as for data reanalysis projects.
Primary responsibility for helping researchers visualize, interact with,
and understand complex geophysical data is managed by the Visualization
and Enabling Technologies Section (VETS) to support part (e) of our mission.
VETS FY2001 highlights include:
NCAR's new Visualization Lab
Terascale visualization, collaboration, and the AccessGrid
Internet and web technologies coupled with high-bandwidth networks have
served as the substrate for wonderful new opportunities in scientific endeavor
and collaboration. While the desktop is still the day-to-day environment
of choice for the individual, group meetings are more important than ever.
From research organizations to businesses to universities, there is an enhanced
focus on sophisticated, technology-mediated meeting spaces that facilitate
information flow and enable virtual encounters.
SCD has recently completed
the development of its new Visualization Lab, a physical facility that blends
visual supercomputing, virtual reality, large-screen tiled display, and
advanced collaboration technology. Backed by an array of large-scale
computational and storage resources, the lab facilitates group exploration
of terascale scientific data.
|
 |
Building upon the AccessGrid, a human-scale group-to-group
collaboration environment, it also opens up opportunities for group
participation in presentations, symposia, and workshops as well as
collaborative research.
The community data portal
Sustainable strategies for enabling both providers and consumers of earth
system data
Scientific data are at the heart of most of our research activities, and
we need to share these data among ourselves and with a geographically distributed
community.
Working with divisions and programs
across UCAR and NCAR, SCD has initiated
a forward-looking pilot project called the Community Data Portal (CDP).
The CDP is targeted directly at elevating our organization's collective
ability to function as a data provider with a coherent web-based presence.
|
 |
During FY2001, several pilot sub-projects were undertaken, including the
ACACIA ARCAS system (ACACIA), Reanalysis-2 data (SCD/DSS), CCM diagnostic
tools (CGD), TIME-GCM data (HAO), vegetation/ecosystem data (VEMAP), and
distributed climate data analysis (COLA). Our efforts here have been extremely
well received and have now grown into the role of a formal NCAR Strategic
Initiative.
The Earth System Grid
In 1999, SCD joined with several DOE labs in a DOE-sponsored research
project called The Earth System Grid (ESG). This effort was aimed at
developing Grid-based technologies that facilitated management and
high-speed access to large-scale distributed climate model data. During
FY2001, we put some of the ESG technology into production operation for
the PCM project, and it now supports a sustained transfer of data from
NCAR to NERSC at data rates much higher than previously possible. We
joined again this year with several collaborators to submit a new
proposal for The Earth System Grid II. Working with Argonne National
Laboratory, Lawrence Livermore National Laboratory, the University of
Southern California, Oak Ridge National Laboratory, and Lawrence
Berkeley National Laboratory, we successfully secured a new research
contract to develop and deploy an operational ESG in support of
terascale/petascale climate research. The project is a significant
opportunity to advance research and computation, and it has already
drawn substantial interest that extends into the international
community.
Community data analysis and visualization
software
Prior to this year, SCD has distributed NCAR Graphics and NCL on a
cost-recovery basis. In FY2000, SCD management made the decision to
move to an Open Source distribution model for NCAR Graphics and a
"free availability" model for NCL (with Open Source planned for the
future), and we began the process of implementing the decision. We
actually implemented the new distribution modes this year, and since
October 2000, there have been roughly 9,000 downloads of NCAR Graphics
and 1,000 downloads of NCL. While it represented a significant loss of
revenue for SCD, this move was extremely popular with our community
and has brought many new users on board. We also continued to work
with the new Weather and Research Forecast (WRF) model team to extend
NCL's usefulness for the WRF community. New features included enhanced
support for the increasingly popular HDF-5 format and completion of a
significant portion of the development cycle for a new high-resolution
map database for NCL. In a similar vein, we began the process of
integrating our enhanced version of the popular Vis5D software into a
new Open Source framework for community sharing, development, and usage.
Complementing all of this was a substantial amount of experimental work
with the scripting language Python and the development of new
visualization software for educational uses under the auspices of our
Visual Geophysical Exploration Environment (VGEE).
A new architecture for terascale data access,
analysis, and visualization
Late in the year, SCD management convened a team tasked with re-examining
the resources we provide to deal with data. This included web-based data
access, future efforts in data portals, research projects in distributed
data, data visualization, and post-computation processing and analysis.
|
The strategic planning process that ensued led to the definition of a
new architecture that integrates a number of functions and systems, and
moves various testbed efforts (SANs, MSS Proxy) into a new production phase.
The new architecture is built on the concept of large shared data objects
(1 TB or more) and speaks directly to user productivity, efficient use of
computational and storage resources, and the support of new efforts in data
analysis and visualization. |
 |
This new effort positions SCD as a leader in providing a powerful, balanced,
and--most of all--productive environment. Acquisition of hardware and software
began this year, while integration and deployment will happen in FY2002.
Primary responsibility for helping researchers efficiently produce valid
simulations on NCAR supercomputers is managed by the User Support Section
(USS) to support part (e) of our mission. USS also supports researchers and
SCD staff by providing supercomputer usage statistics and web publication
services, and by supporting onsite servers, workstations, and application
software at the NCAR Mesa Lab. USS FY2001 highlights include:
Extensive model code conversions facilitated by the
SCD Consultants
Many researchers using NCAR supercomputers run models that were
designed for the parallel vector processing systems such as the
Cray Research X-MP, Y-MP, C90, and J90 systems. As the last two Cray
systems at NCAR are being decommissioned in FY2002, the code that
simulates atmospheric and related physics and chemistry on these vector
systems has to be converted to run on one of the newer symmetric
multiprocessor architectures.
The most promising of these platforms is the IBM SP system. In the
past five years, the SCD Consulting Office guided the programmers and
scientists on more than 19 major community models in converting from
the old architecture to the new. This radical difference in
architectures requires a huge investment in programming time and effort,
and with a staff of five software engineers, the Technical Consulting
Group has successfully handled a shift from straightforward usage
questions to more complex development and design questions in the past
year.
In addition, the consultants developed extensive user documentation and
organized numerous training classes and workshops to help researchers
work productively on the IBM SP systems blackforest and babyblue. As of
September 2001, 65% of researchers using SCD computers were performing
more than 90% of their computational work on the IBM SP (blackforest)
or the SGI Origin 2000 (ute) rather than the Cray computers.
Primary responsibility for maintaining and operating NCAR's supercomputing
environment is managed by the Operations and Infrastructure Support Section
(OIS) to support part (e) of our mission. OIS FY2001 highlights include:
Infrastructure upgrades to support ARCS
In November 2000, OIS began to identify and upgrade those portions
of the computing center infrastructure that would need to be augmented to
support the new ARCS equipment. This specification of the needed infrastructure
equipment was a moving target, as the possible machine configurations were
in constant flux. After significant analysis, SCD identified the upper boundary
of possible equipment to be delivered. OIS then determined that the existing
air conditioning was sufficient, but the power distribution was not. OIS
then began specifying, procuring, and installing all the necessary equipment
so the facility could be ready for the ARCS delivery.
|
After nine months of constant work and some overtime, the power distribution
system passed the startup inspection and went online the first of October,
just in time for the delivery of the first wave of ARCS equipment. The project
was estimated to cost slightly more than $600,000, and it came in slightly
over $500,000. |
 |
These upgrades position the electrical distribution system to support the
next three years of the ARCS contract, and they form the cornerstone of a
solid infrastructure for SCD to continue providing reliable,
production-oriented services and equipment as tools for science.
SCD portal genesis
The Applications Group within OIS has started on an ambitious multi-year
project to architect, implement, and deploy an application portal within
SCD. This portal will provide web-based access to SCD's suite of resources
and services. In addition, the portal is designed to be extensible and
customizable so researchers can better manage the flow of information
needed for their area of research. The project, while in its early
stages, accomplished a great deal in the past year. The portal has moved
from strictly a vision to early proof-of-concept systems. Several
decisions were made, including identifying key technologies and the
development approach. Currently, a preliminary job-submission
utility is working, as well as an interface to some portions of the
Mass Storage System. In the coming year, these early web-enabled
services will be provided to researchers to solicit feedback and
evolve the portal.
|