Operational procedures and infrastructure
The Operations and Information Support (OIS) Section monitors SCD's
computers and UCAR's networks 24 hours/day 365 days/year, maintains the
necessary environmental and hardware infrastructure, manages user accounts
and software licenses, and provides an array of information to users
and SCD staff. These functions are provided by three groups within the
section: the Computer Production Group (CPG), the Infrastructure Support
Group (ISG), and the Database Services Group (DBSG).
The Computer Production Group (CPG) monitored and maintained SCD's
computing resources 24 hours a day, 365 days a year. The many
system attributes monitored included system utilization, load
average, job queues, status of network connections, Mass Storage
System (MSS) accessibility and data transfer, memory errors,
filesystems, and disk errors. Prompt identification of software,
hardware, and network problems of the more than 30 machines in the
machine room and the hundreds of network connections led to timely
resolutions and minimized interruptions to service.
Mass Storage System cartridge mounts in FY97 continued to increase
with over 389,000 robotic mounts. The total archive tapes mounted
by the staff totaled more than 590,000 with the daily manual mounts
across three shifts ranging up to 3,217.
Checked hundreds of UNIX security reports for detection of electronic
intrusion. CPG reported all activity which could compromise security to
the Distributed Systems Group.
Transferred more than 900 pages of technical documentation into
FrameMaker and made these documents available via the web:
http://www.scd.ucar.edu/cpg/Documents.
Providing a web version of the documentation minimizes duplication
of efforts because the information can be referenced in other SCD
documentation via links. Powerful search engines can be used to
quickly locate information.
Maintained the
machine dependencies diagram
which depicts the sequential order for rebooting SCD computer
equipment. Updates were made each time equipment was removed or
installed and when system services were moved from one machine
to another.
Produced a detailed, daily report of system and network activity,
problems, and steps toward resolution. This online documentation
provided SCD with an efficient means of determining the time and
cause of system and network malfunctions as well as the time
equipment and systems were returned to production.
Coordinated efforts with a representative of Health Environment and
Safety Services to inventory all chemicals located in areas staffed
by CPG. Material Safety Data Sheets (MSDS) and instruction for
deciphering the MSDSs were provided for each chemical.
Fiscal year 1997 saw many changes to the computer room. Several of the
changes were highly visible, others were not, but were critical to the
day-to-day operation of the facility. The year witnessed the removal of
the Cray Y-MP8I and the installation of a Cray C916 (antero). The
Infrastructure Support Group (ISG) played a major role in the planning,
supervising and implementation of this installation. Significant
modifications to the electrical and mechanical equipment were made
on a very short time schedule. The year also witnessed the addition
of a Cray J90se (ouray) as well as an HP Exemplar SPP2000 (sioux).
Numerous servers and smaller systems were removed, including the
SUN 690s and the IBM RS6000 cluster. Lastly, the venerable Cray
Y-MP8D (shavano) was decommissioned after years of service. The
removal of shavano provided the opportunity to remove abandoned
plumbing and perform substantial clean-up under the computer room floor.
With the merger of SGI/Cray as well as numerous changes in the Cray
equipment on the floor, a large part of FY97 was spent negotiating
a new SGI/Cray maintenance contract. This contract will be
finalized early in FY98.
The continued growth of the Mass Storage System put considerable
pressure on the available floor space. With some research, ISG
was able to find a number of used racks for 3490 tapes, saving
costs for the division. These new racks increased the available
space in the tape archive by over 15,000 tapes. With the conversion
to SD3 tapes, the pressure on space should be reduced in FY98.
ISG also investigated several options to increase SCD computer
reliability and service to the user community. The group
investigated the costs associated with a backup electrical
generator to provide power to a portion of the room in the
event of a prolonged power outage. Prompted by the failure
of an Uninterruptable Power Supply (UPS) unit, new UPS units
were also investigated. The group recommended that the older
units be replaced because new units could increase capacity
and reduce operation costs. We proposed the UPS and generator
projects be added to the building refurbishment project. ISG
will continue to monitor the costs of new units vs. the
maintenance of the old.
The Trouble Ticket Committee was formed in the spring of 1996 to
investigate the possibility of implementing an SCD-wide trouble
ticket system. The committee investigated several software
packages and decided that the Remedy package offered the desired
functionality.
The Trouble Ticket system will provide a more homogeneous method for
different SCD sections to provide solutions for user problems and
requests. The system will provide a knowledge base and a collective
set of solutions to known problems, thereby increasing the efficiency
for the division and its users.
Late in FY97 a smaller subset of the original committee was charged with
finalizing the implementation details, supervising the development and
implementing the system. To date the committee has been working on the
details of the functional requirements and the ticket work flow to meet
different sections' needs. The goal of the committee is to complete this
phase in early FY98. The functional requirements and work flow will serve
as the blue-print for the development of the system. The committee also
plans to deploy the first version in the winter of 1998. More information
about the committee's work is published at the
Trouble Ticket website.
| Next page |
Top of this section |
Table of contents |
| NCAR |
UCAR |
NSF |
NCAR FY97 ASR |