SCD FY97 Annual Scientific Report

Operational procedures and infrastructure

The Operations and Information Support (OIS) Section monitors SCD's computers and UCAR's networks 24 hours/day 365 days/year, maintains the necessary environmental and hardware infrastructure, manages user accounts and software licenses, and provides an array of information to users and SCD staff. These functions are provided by three groups within the section: the Computer Production Group (CPG), the Infrastructure Support Group (ISG), and the Database Services Group (DBSG).

Monitoring and initial problem identification

The Computer Production Group (CPG) monitored and maintained SCD's computing resources 24 hours a day, 365 days a year. The many system attributes monitored included system utilization, load average, job queues, status of network connections, Mass Storage System (MSS) accessibility and data transfer, memory errors, filesystems, and disk errors. Prompt identification of software, hardware, and network problems of the more than 30 machines in the machine room and the hundreds of network connections led to timely resolutions and minimized interruptions to service.

Media mounts

Mass Storage System cartridge mounts in FY97 continued to increase with over 389,000 robotic mounts. The total archive tapes mounted by the staff totaled more than 590,000 with the daily manual mounts across three shifts ranging up to 3,217.

Security report checks

Checked hundreds of UNIX security reports for detection of electronic intrusion. CPG reported all activity which could compromise security to the Distributed Systems Group.

Procedures on the web

Transferred more than 900 pages of technical documentation into FrameMaker and made these documents available via the web: http://www.scd.ucar.edu/cpg/Documents. Providing a web version of the documentation minimizes duplication of efforts because the information can be referenced in other SCD documentation via links. Powerful search engines can be used to quickly locate information.

Machine dependencies diagram

Maintained the machine dependencies diagram which depicts the sequential order for rebooting SCD computer equipment. Updates were made each time equipment was removed or installed and when system services were moved from one machine to another.

Daily event log

Produced a detailed, daily report of system and network activity, problems, and steps toward resolution. This online documentation provided SCD with an efficient means of determining the time and cause of system and network malfunctions as well as the time equipment and systems were returned to production.

Chemical inventory and instruction

Coordinated efforts with a representative of Health Environment and Safety Services to inventory all chemicals located in areas staffed by CPG. Material Safety Data Sheets (MSDS) and instruction for deciphering the MSDSs were provided for each chemical.

Equipment installations and removals

Fiscal year 1997 saw many changes to the computer room. Several of the changes were highly visible, others were not, but were critical to the day-to-day operation of the facility. The year witnessed the removal of the Cray Y-MP8I and the installation of a Cray C916 (antero). The Infrastructure Support Group (ISG) played a major role in the planning, supervising and implementation of this installation. Significant modifications to the electrical and mechanical equipment were made on a very short time schedule. The year also witnessed the addition of a Cray J90se (ouray) as well as an HP Exemplar SPP2000 (sioux). Numerous servers and smaller systems were removed, including the SUN 690s and the IBM RS6000 cluster. Lastly, the venerable Cray Y-MP8D (shavano) was decommissioned after years of service. The removal of shavano provided the opportunity to remove abandoned plumbing and perform substantial clean-up under the computer room floor.

Maintenance contracts

With the merger of SGI/Cray as well as numerous changes in the Cray equipment on the floor, a large part of FY97 was spent negotiating a new SGI/Cray maintenance contract. This contract will be finalized early in FY98.

Floor space

The continued growth of the Mass Storage System put considerable pressure on the available floor space. With some research, ISG was able to find a number of used racks for 3490 tapes, saving costs for the division. These new racks increased the available space in the tape archive by over 15,000 tapes. With the conversion to SD3 tapes, the pressure on space should be reduced in FY98.

Infrastructure

ISG also investigated several options to increase SCD computer reliability and service to the user community. The group investigated the costs associated with a backup electrical generator to provide power to a portion of the room in the event of a prolonged power outage. Prompted by the failure of an Uninterruptable Power Supply (UPS) unit, new UPS units were also investigated. The group recommended that the older units be replaced because new units could increase capacity and reduce operation costs. We proposed the UPS and generator projects be added to the building refurbishment project. ISG will continue to monitor the costs of new units vs. the maintenance of the old.

Trouble Ticket Committee

The Trouble Ticket Committee was formed in the spring of 1996 to investigate the possibility of implementing an SCD-wide trouble ticket system. The committee investigated several software packages and decided that the Remedy package offered the desired functionality.

The Trouble Ticket system will provide a more homogeneous method for different SCD sections to provide solutions for user problems and requests. The system will provide a knowledge base and a collective set of solutions to known problems, thereby increasing the efficiency for the division and its users.

Late in FY97 a smaller subset of the original committee was charged with finalizing the implementation details, supervising the development and implementing the system. To date the committee has been working on the details of the functional requirements and the ticket work flow to meet different sections' needs. The goal of the committee is to complete this phase in early FY98. The functional requirements and work flow will serve as the blue-print for the development of the system. The committee also plans to deploy the first version in the winter of 1998. More information about the committee's work is published at the Trouble Ticket website.


| Next page | Top of this section | Table of contents |

| NCAR | UCAR | NSF | NCAR FY97 ASR |