Researchers from Cornell University have worked out how to track Twitter users’ locations — even when they have location services disabled.
A paper from Ryan Compton, David Jurgens and David Allen explains a new method for tracking the location of Twitter users to around 6km based on who they interact with. Using the method, the researchers say, they’re able to “geotag over 80% of public tweets.”
Using the small subset of Twitter users who do provide GPS data or unambiguous location information, the algorithm is then able to “assign a location to a user based on the location of their friends,” building up a geotagged map of users across the social network.
The method isn’t perfect. It’s based on the principal that the “vast majority of Twitter users @mention with geographically close users,” and there are some people with more global networks of connections that the method cannot generate accurate results for. But they say their method was “accurate to city-resolution” for 89.7% of test users.
The researchers point to an array of sociological and scientific benefits of being able to track Twitter users who don’t provide location data. These range from “understanding regional flu trends, linguistic patterns, election forecasting, [and] social unrest,” to helping plan “disaster response.”
There are also obvious commercial benefits to the research. It’s a boon to advertisers: Local businesses will be able to target ads far more effectively at people identified as in the area.
But there’s a more sinister side to the findings too. It shows how it’s possible to accurately track a users’ home location based only on who they interact with, even if they have expressly opted out of having their location tracked.
This is a testament to the power of “metadata,” the additional information associated with communications beyond the content itself. (For example, an email’s metadata would include the sender and addressee, and the time it was sent.) While previous studies on Twitter user location have tried to analyse language used for clues to location, the Cornell team ignored the content of the tweets altogether.
“Language-based geotagging models often rely on sophisticated language-specific natural language processing,” they write, “and are thus difficult to extend worldwide.”
The researchers believe they have produced “the largest and most accurate dataset of Twitter user locations” ever, and they have done so by relying on the interactions between users alone to build networks. But by combining their metadata approach with linguistic analysis, it opens the door to ever-more invasive location tracking.