Join

Enter Details

Comment on stories, receive email newsletters & alerts.

@
This is your permanent identity for Business Insider Australia
Your email must be valid for account activation
Minimum of 8 standard keyboard characters

Subscribe

Email newsletters but will contain a brief summary of our top stories and news alerts.

Forgotten Password

Enter Details


Back to log in

A former Google data scientist used the number of online porn searches to predict the unemployment rate weeks before official stats were released

It takes the Bureau of Labour Statistics three weeks to collate its survey data on unemployment and release that information to the public.

Three weeks might seem like a long time — but hey, it’s the way things work. Even Princeton economist Alan Krueger tried and failed to speed up the process when he served as President Obama’s chairman of the Council of Economic Advisors in 2011.

In his new book, “Everybody Lies,” Seth Stephens-Davidowitz presents this problem as a sort of case study into how the definition of “data” is constantly being reimagined. Sure, survey responses are useful, but what if there were some other — albeit unofficial — way to track people’s employment status?

To make a long story short, there is another way. It’s the rate of Google searches for pornography.

Let’s back up to explain how this works. Stephens-Davidowitz writes that a former Google engineer created a system that used people’s flu-related searches to create a picture of the current flu rate — well before the Centres for Disease Control and Prevention released official data.

While Stephens-Davidowitz was a data scientist at Google, he and Google’s chief economist, Hal Varian, used a similar process to paint a picture of the national economic landscape.

By that time, Google engineers had also created a service called Google Correlate, which allows researchers to see what Google searches most correlate with the dataset they’re studying. For example, Stephens-Davidowitz and Varian found that when housing prices are rising, Americans tend to search for things like “80/20 mortgage.”

Stephens-Davidowitz wanted to know if Google Correlate could also help predict the unemployment rate — so he put the United States unemployment rate from 2004 into 2011 into the system.

The search term most closely linked to unemployment? “Slutload.” (It’s a pornographic website.)

Stephens-Davidowitz explains: “This may seem strange at first blush, but unemployed people presumably have a lot of time on their hands. Many are stuck at home, alone and bored.” In fact, another one of the highly correlated searches was “Spider Solitaire.”

Stephens-Davidowitz is quick to note that these searches aren’t necessarily the best way to predict the unemployment rate — but they can certainly be part of the prediction model.

The broader takeaway here is that traditional means of collecting data — i.e. surveys — aren’t necessarily the most efficient or accurate. For one thing, as the title of Stephens-Davidowitz’s book suggests, people aren’t always truthful when they respond to those survey questions.

As Stephens-Davidowitz explains later in the book, that’s especially true when you’re looking for data on, say, the rate of homosexuality. About five per cent of pornography searches by men are for same-sex pornography — which is about twice the percentage of American men who indicate on Facebook that they’re interested in men.

Stephens-Davidowitz writes: “Frequently, the value of Big Data is not its size; it’s that it can offer you new kinds of information to study — information that had never previously been collected.”

NOW WATCH: A former Google data scientist explains how liking curly fries could help you get hired

NOW WATCH: Ideas videos

Business Insider Emails & Alerts

Site highlights each day to your inbox.

Follow Business Insider Australia on Facebook, Twitter, LinkedIn, and Instagram.