Lack of Twitter geotags can’t stop researchers from getting location

Ars Technica » Scientific Method 2014-03-24

A set of tweets that the IBM researchers used to determine a regional prototype.

Three researchers from IBM have developed an algorithm that can predict a Twitter user's location without needing so much as a single geotag from them. According to the Arxiv paper on the subject, the location prediction comes largely from assessing the similarity of the content of a user's tweets to other users' tweets who do use geotags, which turns out to be a decent predictor.

While geotags are the most definitive location information a tweet can have, tweets can also have plenty more salient information: hashtags, FourSquare check-ins, or text references to certain cities or states, to name a few. The authors of the paper created their algorithm by analyzing the content of tweets that did have geotags and then searching for similarities in content in tweets without geotags to assess where they might have originated from. Of a body of 1.5 million tweets, 90 percent were used to train the algorithm, and 10 percent were used to test it.

Using this system, the researchers could predict a user's city with 58 percent accuracy—far from deadly aim, but statistically significant nonetheless. Larger regions could be predicted with increasing levels of accuracy, with 66 percent on a state level and 73 percent on a time zone level.

Read 1 remaining paragraphs | Comments