This is what went wrong with Telstra’s mobile network

Photo: Getty/Max Whittaker

Telstra’s mobile network had four major outages during February and March. In a bid to stem the damage to the brand, the telco offered two ‘free data’ Sundays to compensate customers.

When you’re a company that not only advertises its premium service, but charges for it too, customers expect answers when it goes down.

Speaking today to the CommsDay Summit, Telstra’s chief operations officer, Kate McKenzie, said outlined what went wrong, while adding that the telco is making every effort to make sure there won’t be a repeat.

Importantly, McKenzie wanted to emphasis that none of the problems were a system-wide failure.

The telco is now taking steps to put forward a network wide review with technology partners Ericsson and Cisco.

“We have already progressed short to medium term actions to improve resilience and robustness in the network,” McKenzie said.

“Changes have been implemented to increase the capacity and path diversity of critical signalling channels, and a temporary layer of traffic management protection has been added to minimise the impact of events like that we saw in March and February.”

At the time of the first major outage on 9 February, McKenzie attributed the problem to human error, which she repeated today.

“With evidence of increasing degradation of the health of the node and potential service risk, a decision was taken to isolate the node from the network – a standard operating procedure for such an event,” McKenzie said.

“Due to processes not being followed properly, the subsequent node restart initiated incorrectly.

“This meant that 15% of all mobile devices connected through this node needed to re-register when establishing a new voice call or data session.”

And the result? A large number of customers couldn’t make calls or use the internet. She said that voice services were prioritised, with those services back about two hours after the problem was first spotted and data returning 90 minutes after that.

While Telstra was criticised at the time for throwing a single employee under the bus, McKenzie said that the employee responsible is still working for Telstra.

On 1 March, a second outage caused widespread issues across the telco’s pre-paid network limiting customers’ ability to make calls and access data services. McKenzie did not mention that issue or its causes.

The next big outage occurred on March 17, and affected around 50% of Telstra customers were hit from 6pm with voice calls and data access offline. This issue was caused by an international cable fault that disconnected many device IPs from the network. At first it hit international customers roaming before domestic customers were struck.

The last outage, just five days later, meant voice calls for Telstra mobile and NBN voice customers were down for several hours, mostly in Victoria and Tasmania. McKenzie said this only affected 3% of customers and was caused by a card failure in one of its Victorian media gateways.

The telco is hoping the findings of its review will help prevent further major outages via new configurations with additional redundancies.