ANZ is using machine learning to improve the accuracy of data forecasting

(Photo by Chris McGrath / Getty Images)
  • ANZ applied the “random forest” machine learning techniques to try and beat the current benchmark for retail sales forecasts.
  • The results showed a slight improvement on the consensus forecasts carried out by Bloomberg.
  • But the analysts said the relative lack of macroeconomic data in Australia negates some of the benefits of machine learning.

ANZ Bank has turned to machine learning to improve existing forecasting techniques.

Economists Jack Chambers and David Plank applied the technique to monthly retail sales data, and compared it to the standard error found in consensus surveys compiled by Bloomberg.

The machine learning process used by the pair was called “random forest”. Think of a standard decision tree model, which maps out decisions or actions and their possible consequences.

It follows that the “forest” is comprised of multiple decision trees, which are calculated and averaged to find correlations with retail sales. In ANZ’s words:

The starting point for a random forest is a decision tree. The tree consists of a series of questions that help determine whether sales rose or not. The first question might be: did consumer confidence rise in the month. The two possible answers (yes/no) provide the first two branches of the tree. The next question might be: did petrol prices rise. The answers provide two branches off each of the existing branches. And this process repeats for each subsequent question.

So, two separate decision trees — and by tying them back to one another, the model can gather more evidence leading back to the core question: were actual retail sales in line with forecasts?

“The leap from a single decision tree to a random forest is similar to the concept of the wisdom of crowds,” the pair said.

“Each person brings their own knowledge to answering a question, so aggregating their answers increases the total amount of input information.”

Plank and Chambers used 49 different data variables over a period of 13 months, which resulted in 649 potential random variables.

The research found two variables, which turned out to be the most important for retail sales, tied back to the labour market.

One was the change in total employment four months ago, and the other the change in the NAB business survey’s indicator for employment nine months ago.

So did their testing beat the benchmark?

“The mean absolute error for the model’s forecast of monthly retail sales growth was 0.31% –- slightly less than the Bloomberg consensus of 0.36%,” they said. Success.

However, they noted the improvement was only marginal — perhaps a suggestion that Bloomberg’s consensus forecast is already providing some the “wisdom of crowds” found in the random forest technique.

And they said their model was limited by the lack of macroeconomic data provided in Australia compared to other countries.

In addition, they said the purely data-driven approach of random forest machine learning without any underlying theory makes it hard to properly determine causal links.

Despite that, their use of machine learning is reflective of ongoing efforts by the pair to gain insights via new technology.

Business Insider Emails & Alerts

Site highlights each day to your inbox.

Follow Business Insider Australia on Facebook, Twitter, LinkedIn, and Instagram.