|
|
|
|
|
|

SCD has established a prototype of such a facility, and calls it the NCAR DataPark. This prototype was deployed during FY1997 and enhanced during FY1998.
SCD's objectives for the DataPark include:
During FY1998, the Mass Storage System Group of SCD performed Fibre Channel switch and RAID testing utilizing components of the DataPark system. This testing showed that a switched Fibre Channel fabric can provide high performance (90 MB/sec for large sequential direct I/O reads on a single Fibre Channel connection, 82 MB/sec for writes) for storage access from computer systems, and that striping across multiple Fibre Channel connections scales well. This testing gave us the confidence to purchase a Fibre Channel switch and a 1.1-TB Fibre Channel RAID system for deployment on the prototype DataPark in FY1999. The majority of this disk space will be used for DataPark user applications. The remaining space will be used to test true shared filesystems across heterogeneous computer systems, with the goal of providing higher performance access to shared data than NFS or DFS can provide.
Also during FY1998, we investigated several commercial Hierarchical Storage Management (HSM) products for possible use in the DataPark (to address objective 4 above). Current vendor offerings do not have a standard for metadata interchange, and often have no defined method to interface to data stored in other archive formats. These capabilities would provide a way to access data both in our own MSS archive, as well as provide a way to bulk-import or export data to or from our computing center. We intend to track developments and new products during FY1999, as well as possibly test some.
In FY1998, SCD also deployed the new Distributed Computing Services (DCS) data movement interface (msrcp) to the MSS. This interface was designed to work efficiently on the DataPark, with data movement taking place on the node with the best connection to the (local) data. This allows the user to access MSS data on any DataPark system without needing to know the optimal system on which to issue the msrcp command.
Ultimately, SCD believes that a system like that depicted in the conceptual diagram below is required to truly provide both computational and data access capabilities required for analysis of the results of the next generation of climate and mesoscale simulation models. SCD hopes that during FY1999 and FY2000, resources will be available to replace the existing DataPark hardware and software subsystems and augment the data storage capacities so that it better resembles the diagram below, rather than the diagram above.
The future NCAR DataPark system, as depicted above, is intended to have high speed network and data channel connections to the supercomputers at NCAR. It would also have high speed network and data channel connections to the NCAR Visualization Laboratory, the NCAR Mass Storage System, and external NSF networks (such as the vBNS). In addition, the approximately Terabyte (TB) of online, directly attached disk storage would be augmented by a Hierarchical Storage Management (HSM) system providing immediate access to an additional 5-12 TB of storage. The HSM system would provide a "high-speed data cache" where model output can be stored as it is generated on the supercomputers, then accessed for analysis and reduction activities before results get staged to the NCAR Mass Storage System -- thus hopefully offloading the growing demand on the NCAR MSS for storing model output data prior to analysis.
|
|
|
|
|
|