HPC/HTC Topbar Image
»
Maintenance Schedules and Notifications to HPC/HTC Users
Maintenance Schedules and Notifications to HPC/HTC Users

Scheduled Maintenance for SGI Altix UV, SGI Altix ICE 8400 and IBM iDataplex HTC

SGI Altix UV, SGI Altix ICE 8400, and IBM iDataPlex HTC scheduled maintenance windows occur, as necessary, during these periods:

  • 8AM to 5PM the last Wednesday of every month – for partial cluster maintenance

These maintenance windows represent periods when UITS may choose to drain the queues of running jobs and suspend access to the cluster operation for HPC/HTC maintenance purposes.

The maintenance periods are monthly. Interruptions are kept as brief as possible. Prior to performing maintenance during any of these time windows, UITS will notify users via the HPC-Announce list at least 10 days prior to the maintenance procedure. The notification will describe the nature and extent (partial or full) of the interruptions of HPC/HTC services.

Batch Queues Maintenance

Batch queues will also be modified prior to scheduled downtimes to hold jobs which request more wallclock time than remains before the shutdown (unless the job request specifies that the job is using checkpoint/restart).  

Reasons for Scheduled Maintenance Include:

  • Environmental maintenance (cooling, power)
  • Installation of security patches
  • Hardware or firmware upgrades
  • Software patches and upgrades
  • Software and component installations
  • Re-configurations
  • Server reboots
  • Availability and fail-over testing

Emergency Maintenance

Unavoidable (emergency) downtime may occur as a result of any of the above reasons at almost any time. Such events are rare and great effort is made to avoid these situations. However, when emergency maintenance is needed, the UITS unit responsible for the item affected will provide as much notice to users as possible and work to resolve the fault as quickly as possible.

Notifications and Communications to HPC Users

The following notification practices to HPC/HTC users will be conducted as part of all software or hardware maintenance, hardware installation, planned outages, and unplanned outages.

  • Notification of any planned outage as required by routine maintenance will be provided to the University of Arizona user community at least 10 days prior to the event.
  • Notification of emergency outages will be made available immediately to the University of Arizona user community.
  • The following notification platforms will be utilized during any planned or emergency outage.
    • Alert Status of University IT Services
    • HPC Announce List – Research Computing’s standard announcement list

 

Site map: http://rc.arizona.edu/sitemap