Resolved: Service Outage: CSE SPA Server down

When

Monday, November 30, 2020

Units Affected

School of Physics and Astronomy

Outage Description

At 10:30 a.m. on November 30, 2020, CSE-IT became aware that the server in SPA was down. This was caused by a cooling failure of the PAN data center.  This brought down the computation server and Hadoop cluster along with some other servers. Services and data have been restored. No action should be necessary to restore service from the client-side. If you are still having the symptoms described above, please contact CSE-IT immediately at 612-625-0876 or csehelp@umn.edu.

Outage Updates

  • 11/30/2020, 4:00 p.m. Major Incident Resolved. On Sunday, November 29th, the PAN datacenter had a cooling failure. This brought down the computation server and Hadoop cluster along with some other servers. Services and data have been restored. No action should be necessary to restore service from the client-side. If you are still having the symptoms described above, please contact CSE-IT immediately at 612-625-0876 or csehelp@umn.edu.
  • 11/30/2020, 2:30 p.m. A switch has been determined to have faulty ports, and the degradation with the Hadoop cluster has been partially remedied with more work left to perform. We will continue to work to restore the remaining isolated service and network outages that are still present, and we will update this notice once the services have been restored.
  • 11/30/2020, 12:30 p.m. Service has been partially restored except for a small number of home directories, the old physics website, and the Hadoop cluster. We are working to restore these remaining services as soon as possible and will update this notice at 1:30 pm.
  • 11/30/2020, 11:30 a.m. Yesterday, a cooling issue occurred in the SPA datacenter that impacted several servers. These include the old physics website, some home directories, and some research storage. Other file storage facilities may also be affected depending on location. We are working to restore service as soon as possible and will update this notice at 12:45 pm.
  • 11/30/2020, 10:30 a.m. CSE-IT became aware that the server in SPA is down. We are investigating the root cause of the outage. Symptoms include: Old website being down, non-aligned homedirs, and some research storage is unavailable and Inverness file server. We are investigating the cause of the problems and will update this notice at 11:30am.

Share