The Australian Taxation Office on Thursday released a report on its investigation into the computer system outages suffered over December to February.
While the main technical causes had already been revealed by ATO commissioner Chris Jordan at a budget estimates session last month, the full report showed that 77 component failure events “similar to those experienced during the December outage” were recorded in system logs from May 2016 before the first meltdown on December 11, 2016.
“In addition, at least 159 alerts were recorded in storage area network device monitoring and management logs,” the report stated.
The tax office wrote that technology supplier Hewlett Packard Enterprise took “some actions” in response to the events, but alerts continued to be raised by the 3PAR SAN system.
“We were not made fully aware of the significance of the continuing trend of alerts, nor the broader systems impacts that would result from the failure of the 3PAR SAN,” the report read.
Astoundingly, the second major outage, on February 2 of this year, was caused by a component that became physically unplugged during remediation work for the December incident. And the nightmare started all over again.
“Unfortunately, during one cable replacement exercise, we were informed that data cards attached to the SAN had been dislodged. This caused the 3PAR SAN to act in a similar way to that noted during the December outage,” stated the report.
“This included unsuccessful steps to automatically remediate, followed by a system shut-down to preserve data integrity. HPE communicated this priority 1 incident to us immediately.”
Jordan revealed last month that HPE had paid an undisclosed amount of compensation to the tax office.
“The settlement recoups key costs incurred by the ATO, and provides additional and higher grade IT equipment giving the ATO a world-class storage network,” he said at the time.
The report reiterated Jordan’s statement last month that the storage system was configured for performance and economy rather than resilience. It was also noted that internal ATO tech staff didn’t have sufficient visibility into the HPE systems.
Despite the release of the report, the actual cause of the hardware failures is still unknown.
“This root cause examination cannot be completed until the SAN is physically removed and taken back [to the USA] for forensic testing. This process may not be completed until late 2017,” stated the ATO.