In his new book, mathematician and OKCupid founder Christian Rudder explores how data correlations can reveal deeply personal characteristics with a scary degree of accuracy. Andit seems Facebook’s like button has particularly strong predictive power.
The book, “Dataclysm: Who We Are When We Think No One’s Looking,” cites a 2012 study from the UK which found that using only Facebook likes, researchers could accurately predict race, sexuality, intelligence, and political affiliation.
In their article in Proceedings of the National Academy of Sciences of the United States of America from April 2013, the researchers wrote:
[T]he best predictors of high intelligence include “Thunderstorms,” “The Colbert Report,” “Science,” and “Curly Fries,” whereas low intelligence was indicated by “Sephora,” “I Love Being A Mum,” “Harley Davidson,” and “Lady Antebellum.” Good predictors of male homosexuality included “No H8 Campaign,” “Mac Cosmetics,” and “Wicked The Musical,” whereas strong predictors of male heterosexuality included “Wu-Tang Clan,” “Shaq,” and “Being Confused After Waking Up From Naps.”
The likes could also indicate whether a person’s parents had divorced in their youth. Researchers reported that children of divorced parents are more likely to be preoccupied with relationships on Facebook, liking pages such as “If I’m with you then I’m with you I don’t want anybody else.”
“You know science is headed to undiscovered country when someone can hear your parents fighting in the click-click-click of a mouse,” Rudder writes.
The researchers seemed surprised to find that in some cases, fewer than 5% of correlated likes had any direct apparent relevance to the characteristic in question. For example, while liking Barack Obama was obviously indicative of the person being a Democrat, it was also notably popular among Christians, African Americans, and gay men and women.
Correlation Is Not Causation
To be clear, the research doesn’t suggest that eating curly fries makes you smarter or that straight people are naturally more confused after waking up from naps; it only indicates a correlation between the two.
To see how accurate these predictions are, take a look at the chart below from the original report. The number represents the likelihood out of 1 that the prediction is correct.
Rudder notes that this could have significant privacy implications for job applicants, with employers able to reasonably accurately assess someone’s IQ without a test or any consent, based purely on their Facebook likes.
Rudder also notes that other personal information would likely yield similarly telling data; examples include mobile phone metadata, credit card purchase data, email keywords, and location data.
Most people lack a comprehensive understanding of how much personal information such data collection can yield, according to Rudder.
“Because so much happens with so little public notice, the lay understanding of data is inevitably many steps behind the reality,” he writes.