1998 ASR Home
Back
SCD ASR Index
Next
SCD Home

NCAR's Mass Storage System

The NCAR Mass Storage System (MSS) is a central, large-scale data archive that stores data used and generated by climate models and other programs executed on NCAR's supercomputers and compute servers. At the end of FY1998, NCAR MSS managed more than 5,000,000 files containing a total of over 150 terabytes (TB) of stored data, and the net growth rate of data in the MSS was approximately 5 TB per month. On average, 74,000 cartridges are being mounted per month, approximately one half of these by operators and the other half in the StorageTek Powderhorn Automated Cartridge Subsystems (ACS). The StorageTek Powderhorn ACSs are systems that use robotics to mount and dismount cartridges. On a daily basis, the MSS handles approximately 13,000 requests resulting in the movement of over 500 GB of data. During FY1998, data transfers to and from the MSS exceeded 194 TB.

While some of the data stored on the NCAR MSS originate from field experiments and observations, the bulk of the data is generated by global climate-simulation models and other earth-science models that run on supercomputers, and SCD faces an increasing demand to archive data from ever-faster supercomputers. Essentially, the faster the supercomputer, the more data there are to be archived. Even greater demands for archiving data will result from the growing use of coupled atmospheric/oceanic simulation models.

MSS growth

NCAR Mass Storage System growth during FY1998 continued at projected rates with the introduction of the 128-processor Silicon Graphics Cray Origin2000 system into the Climate Simulation Laboratory and a second Silicon Graphics Cray J90se and a 64-processor HP SPP-2000 into the Community computing environment. The addition of this new supercomputing power effectively doubled the average number of GFLOPS being delivered to user applications. Projecting this growth into the future, it is not difficult to realize that new storage paradigms and user education will be required, since without this the growth in just three to five years will be untenable.

The following table compares year-end statistics for FY1996, FY1997, FY1998, and projected statistics for year-end FY1999 and FY2005. The FY2005 estimates assume a flat budget for supercomputing, historical data storage trends at NCAR, and Moore's Law growth in computer performance per unit cost. Even with the most optimistic vendor projections for storage densities and costs, these estimates indicate that the NCAR MSS would require between one and two dozen ACSs and the annual MSS budget will exceed that for supercomputers.

MSS growth statistics and expectations
 eFY1996eFY1997eFY1998eFY19994eFY20054
Total storage (TB) 821101502565,700
Total files (x106) 2.93.95.18.5190
Net growth (TB per month) at eFY 1.53.05.07.5220
Data read/written (TB per month) 81612030500
Data migrated internally (TB per month) 8162030500
Manual tape mounts (number per month) 45,00060,00037,00018,0001,0005
Robotic tape mounts (number per month) 40,00050,00037,00055,000900,0005
Offline cartridge count 145,000165,0002169,0003169,00085,0006
GFLOPS on NCAR computing floor ~5~10~20~36~1,000
  1. 16 TB per month = 5 MB/sec
  2. All on IBM 3490 cartridge media
  3. Mixture of 166,700 3490 cartridge media and 2,300 SD-3 cartridge media
  4. Projected assuming a flat computational budget through 2005
  5. Assumes one copy of all data is under robotic mount control
  6. Assumes size of existing offline archive will decrease as media densities increase and existing data is oozed to those higher density media

MSS upgrades

In FY1998, SCD made the following improvements to the NCAR MSS:

First, working with the Distributed Computing Services project team, the MSS group co-developed a data transfer interface between the MSS and workstation/server machines. In FY1997 a beta version of the msrcp command was introduced. msrcp is similar to the UNIX rcp command, except the target or source for msrcp is an MSS file. Like its UNIX counterpart, msrcp supports wildcard and recursive subdirectory descents. In FY1998, msrcp became a production interface.

Second, additional ESCON channels were added to the Mass Storage Control Processor (MSCP) and additional ports were added to the ESCON director switch. These channels and ports will be used to support additional ESCON-attached tape drives to increase MSS performance and to evaluate emerging tape technologies.

Third, serial High Performance Parallel Interfaces (HiPPI) were integrated into the MSS High Performance Data Fabric (HPDF). Serial HiPPI is an optical-fibre-based connection used to connect compute servers to the MSS HPDF. Serial HiPPI has many advantages over parallel copper HiPPI, including smaller deployment costs and smaller, more reliable connectors and cables.

Fourth, the ability to automatically create multiple copies of MSS files was implemented to address tape transport and media reliability issues encountered in late FY1997 and early FY1998. The NCAR MSS now has the capability of creating from 1 to 32 copies of every file written into the MSS. Currently, the MSS will create two copies of selected MSS files to maintain file error and loss rates much better than one in 100 million.

Finally, the NCAR MSS Group completed a production MSS-IV Data Migration server. This server supports the internal migration of MSS data in the storage hierarchy and the oozing of existing archive data to new storage technologies. A Silicon Graphics Cray Origin2000 four-processor system was purchased to support the new server code and will be deployed for production in early FY1999.

MSS access methods

The NCAR MSS provides direct storage-device access via a High-Performance Data Fabric (HPDF). The data fabric consists of host computer High Performance Parallel Interface (HiPPI) channel interfaces, non-blocking HiPPI switches capable of supporting multiple bi-directional 100 MB/sec data transfers, and protocol converters that connect the HiPPI data fabric to the IBM-style device control units. The data fabric provides data paths directly between the MSS storage devices and the client compute servers. To utilize this data fabric, SCD has written a file-transport type of interface to enable users to copy files between their host systems and the MSS. The data fabric provides the fastest access of the two methods and can support 12 independent file-transfer operations between the storage devices and the compute servers, with 6 transfers sustaining 3 MB/sec each and 6 transfers sustaining 10 MB/sec each, for an aggregate total of 78 MB/sec.

Optical-fibre-based serial HiPPI was introduced into the HPDF in FY1998. Serial HiPPI has many advantages over parallel copper HiPPI implementations, including smaller deployment cost and smaller, more reliable connectors and cables. Older parallel copper HiPPI interfaces will be phased out as the host machines supporting those interfaces are replaced or retired.

HiPPI technology continues to be deployed only in a niche market. It has not shown signs of spreading into the commodity marketplace, and as a result the cost of HiPPI technology has remained high and the number of HiPPI vendors is dwindling. The lack of availability and support of HiPPI technology is becoming a critical issue to the continued operation of the MSS. Replacement technologies are on the horizon, but not yet widely available nor are they functional enough to immediately replace HiPPI. Promising replacement technologies are Fibre Channel and Network Attached Storage Devices. Fibre-Channel-attached RAID units are available today at extremely attractive costs. During FY1998 Fibre Channel RAID technology was evaluated by the MSS Group and will be deployed in FY1999 to supplement the disk capacity of the DataPark. Over the next few years, the number and types of available Fibre-Channel-attached devices are expected to grow and include tape storage. Once tape devices can be Fibre Channel attached, SCD intends to evaluate the replacement of our HiPPI fabric with Fibre Channel.

Network Attached Storage Devices (NASD) is another emerging technology that is being closely tracked by SCD. Today a handful of vendors supply Network File System (NFS)-based NASD devices. Some vendors are developing "local-disk"-attached NASD products using Fibre Channel and HiPPI connections. SCD's current strategy is to deploy a Fibre Channel infrastructure and add NASD to it at a later time. The end result will be the decommissioning of our HiPPI fabric and ESCON and BMX storage devices, and the wholesale replacement of those older technologies with new, vendor-supported (and hopefully standards-based) technologies.

MSS storage hierarchy

The NCAR MSS currently uses two levels of storage: online and offline. The most frequently accessed data are kept on the fastest storage media, which is the online storage devices: 180 GB of IBM 3390 Model 3 disks and two StorageTek Powderhorn ACSs. The Powderhorn ACSs use 3490E-based technology as well as StorageTek SD-3 (Redwood) technology. Currently, the NCAR MSS has two Powderhorn ACSs, one using 3490E tapes and the other using SD-3 tapes, providing a total online capacity of approximately 300 TB. Lower in the storage hierarchy is a 3490E offline cartridge tape library holding 166,700 cartridges that can be staged with one of the 16 external IBM 3490E manually mounted cartridge drives. StorageTek SD-3 drives have been added to the offline storage level for providing secondary copies of the Powderhorn-resident SD-3 files and for migration of the existing data holdings.

"Migration" refers to the massive task of transferring tens of terabytes of data from old media to modern media before the equipment that uses the old media becomes obsolete. This task by itself is straightforward; however, this data migration must be handled as a background task while the processing and storage components of the system remain fully dedicated to supplying prompt, 24-hour-per-day service to users. When the migration is complete, the total capacity of the offline archive (assuming no reduction in the offline archive's available floor space in the SCD machine room) will exceed 1 petabyte. The migration was started near the end of FY1998 and is expected to take 2 to 3 years to complete.

Expansion of the MSS storage hierarchy is planned over the next five years with the introduction of new tape technologies, new ACSs, and with the integration of a front-end file server having its own HSM to offload active and temporary data. The MSS Archive will become a back-end store for the file server accessed only by the front-end HSM. A single global name space will be provided for all data managed by SCD. Evaluation of HSM solutions began in FY1998 and will continue in FY1999.

MSS import/export capability

Another important capability of the NCAR MSS is the ability to import and export data to and from external portable media. Importing data involves copying data from portable media to the MSS data archive, while exporting data involves copying data from the MSS data archive to portable media. Import/export allow users to bring data to NCAR with them, as well as take data away. Import also allows data from field experiments to be copied to the NCAR MSS archive.

Options to exchange data with smaller satellite storage systems are being investigated. Using this technique, data generated at NCAR could be transferred to remote sites for further analysis. The NCAR SCD storage model would thus be geographically distributed, rather than centrally located and administered.

In addition to 3480 and 3490E cartridge tapes and 9-track round tapes, the NCAR MSS also offers import/export to single and double-density Exabyte cartridge tapes. The deployment of an MSS-IV Import/Export server in FY1999 will provide the ability to support many more device types, such as CD-ROM, DAT, and newer Exabyte media to name a few.

MSS-IV: The next-generation NCAR Mass Storage System

Great strides in the ability of supercomputer vendors to supply computing cycles (Moore's Law can be interpreted to state that the number of floating point operations (FLOP) per dollar is doubling every 18 months) are dramatically increasing the demands being placed on the NCAR MSS. MSS capacity and performance must stay in balance with the computational resources available to the NCAR user community. The current NCAR MSS, MSS-III, has scaled successfully with the increasing demands, but it is bound by the capacity of the largest IBM mainframe available. The need to scale past the limits of MSS-III in a financially achievable manner must be addressed. Hence, the development of MSS-IV.

MSS-IV is the design of the next-generation NCAR Mass Storage System. Based on the proven MSS-III design architecture, MSS-IV is meant to:

  1. Extend the capabilities and capacity of the current MSS-III system
  2. Inter-operate with the current production MSS-III system
  3. Be incrementally implementable and implemented
  4. Allow for a gradual phase-out of MSS-III

MSS-III can be decomposed into a set of functional components. These components are based on the IEEE Storage Reference Model Version 2. While adhering to this model, MSS-IV is being designed as a distributed system. An initial design requirement of MSS-IV was to eliminate the dependence on MVS mainframes and move toward a more heterogeneous, vendor-independent implementation.

Initially, the platform of choice will be smaller UNIX computers, but MSS-IV is not limited to these, and it will be deployable on MVS mainframes. MSS-IV will be implemented in a distributed computing environment where the functional components will be matched with an appropriate compute platform. That is, a data mover function will be deployed on a platform that is configured for bulk data transfer efficiency, while a database function will be placed on a platform that is configured for transaction processing efficiency.

MSS-IV extends the capabilities of the current MSS-III system. Certain device connections that are impossible on an MVS mainframe, such as SCSI-attached devices, can be achieved in MSS-IV. MSS-IV allows the integration of vendor-supplied software as new or improved functional components are available. This can be as specific as a device interface or as extensive as a fully functional archive system.

In addition to advanced capabilities, the MSS-IV design extends the capacity of the system well beyond that of MSS-III. MSS-III must be deployed on a single machine. MSS-IV is a distributed design that can be deployed across multiple machines. Hence the total system capacity of MSS-IV can exceed what can be achieved on a single machine. In addition, the capacity of MSS-IV can be extended simply by replicating one or more of its functional components. For example, the total data migration function could be deployed on multiple machines that yield a higher aggregate data transfer capacity than a single machine could achieve.

The design of MSS-IV eliminates the need to build a complete MSS-IV system before it can be deployed. Therefore, MSS-IV will be deployed incrementally, requiring each MSS-IV component to interoperate with other components from both MSS-III and MSS-IV. This design allows for a user-transparent migration from MSS-III to MSS-IV in an orderly, incremental manner.

FY1998 saw the deployment of the first production MSS-IV server, a data migration server along with the underlying infrastructure upon which other MSS-IV servers will be built. An import/export Exabyte server is scheduled for deployment in FY1999. A metadata server will be designed in FY1999 and a storage server the following year.

1998 ASR Home
Back
SCD ASR Index
Next
SCD Home