SCD has established a Machine Dependencies Committee that has reviewed the machine dependencies at the Mesa Lab and targeted possible single points of failure. The two most important services requiring continuous availability are DNS and user authentication, since all other systems in the NCAR environment depend on these services. To ensure continuous availability of DNS and authentication services, the Machine Dependencies Committee recommended a high availability configuration that would contain these services (code named the "Phoenix Project"). The Phoenix Project was placed into operation during FY97. The licensing service, provided by an obsolete Sun server, was moved to the Phoenix Project during the summer. Licensing service provides licenses for commercial software applications, such as FrameMaker and Sun compilers.
Other single points of failure were identified by the SCD Executive Committee and Machine Dependencies Committee. After DNS, the most critical service is Network File Service (NFS): it affects a majority of systems in the Mesa Lab machine room. The file service was migrated from the old Auspex 5500 to an SGI Challenge XL. The Challenge XL uses RAID storage devices, exclusively, to ensure continued data reliability and user access to large volumes of data. To provide high availability to file service, a second Challenge XL server was purchased with funding from the Director's Reserve fund, CGD, and SCD. The backup file server's primary function will be to provide a large storage space to conduct data processing. Its secondary function will be to act as a hot spare to the primary file server.
The Foothills Lab also makes use of DNS service, and its users are heavily dependent on the server being in continuous operation. The Foothills Lab high availability system should be online by the end of calendar 97. Other measures have been used to increase availability of services within SCD, such as adding RAIDs on numerous servers; use of dual power supplies on servers; and mirroring of system disks.
The Distributed Systems Group intends to expand the use of the high availability configuration as funding is available, by having MIGS and IRJE back each other up in the case of a system failure, while still performing their primary services. Other candidates for high availability configuration are the internal and external WWW servers. The new gateway security server, which will monitor network activity for security violations, will also be set up in a high availability configuration.
The Office Systems Group's Wintel server configuration will also take advantage of a high availability configuration to assure uninterrupted Wintel client service. The servers run Microsoft NT server software, which supports the use of high availability in a production environment. The Microsoft failover software, unlike UNIX high availability, has been massively deployed throughout the computer industry and is extremely reliable.
| NCAR | UCAR | NSF | NCAR FY97 ASR |