IBM Mixes Scientists, Social Media, And Supercomputers — Here’s What They Get

Think sign
Think sign at IBM’s Watson Research centre

[credit provider=”Business Insider”]

There’s a team of research scientists at IBM that loves social media, especially Twitter.”Let’s view Twitter as the having the pulse of everything we care about as a society,” says Rick Lawrence, who leads IBM’s Machine Learning Group at IBM Research at Yorktown Heights NY.

He’s not alone in that assessment. The Library of Congress has decided that Twitter is so important, it is archiving every tweet. Considering that Twitter streams some 250 million tweets a day, that’s a lot of Twitter love.

The team at IBM has been sifting through tweets, Wikipedia, blogs, and news to develop all sorts of new technologies. Lawrence recently spoke with Business Insider abut this work.

Naturally, one of the things the research team became really good at was sifting through social media to find the important stuff. Their efforts landed in IBM’s social media monitoring product last year, Cognos Consumer Insights. Like many tools of this kind, it can also measure sentiment — whether a tweet expresses positive or negative thoughts.

It’s interesting how IBM researchers built their sentiment analysis model. They used movie reviews.  “This language must be negative because it was a one-star. This language must be positive because it was a five star,” he says.

That sentiment model worked well for other content, like blogs, but it utterly failed with Twitter. Twitter’s 140 characters means lots of abbreviations and strange syntax, which Lawrence calls “Twitter speak.”  Twitter is almost its own language.

IBM Scientist Rick Lawrence
IBM scientist Rick Lawrence looks to Twitter to read the world’s mind

[credit provider=”Business Insider”]

“If Watson had to learn English from Twitter, I think it would have be challenged to beat Ken and Brad,”  Lawrence jests referring to the famous man-vs-computer Jeopardy show.

So the team built a model to decode tweets by recognising the probability of word combinations. “We have a paper on this. This is serious science,” he says.

And that’s when the real fun began.

For instance, IBM is involved in a project that will help the LIbrary of Congress make sense of those billions of tweets it is saving.  “What’s the social media app that gives us the best organisation of knowledge that we have as a society? Wikipedia. If it’s not in Wikipedia, we don’t care about it,” he says.

So every tweet will be mapped to the Wikipedia organisation structure. By mapping tweets about Mitt Romney’s tax returns, to a page on, say, the 2012 Republican nomination, the 2012 election, Republican canidates, and so on (and tagging it with sentiment), IBM can easily use tweets to determine Amercia’s mood toward Republicans, the 2012 lections and the debates.

IBM researchers are also using Twitter to crowd source wait times at airports in a project called TSA Tracker. It searches tweets for mentions of airports. It sends an @reply to the tweeter and asks for a reply with their wait times. It then offers a map that shows all security wait times.

“Think of Twitter as a network of human sensors,” describes Lawrence.

Perhaps one of the most fascinating projects doesn’t involve Twitter at all. It is IBM’s work for a hedge fund. While there are people who believe that if you can monitor the mood on Twitter, you can use it predict the S&P,  Lawrence isn’t among them.

However, he is building a tool that can monitor trusted sources of financial news and correlate with stock market changes. For instance, the tool can calculate the measurable impact of the words  “earnings and upgrade” on the S&P as well as the words “earnings and downgrade.”

Much like the sentiment model built for Twitter, such a tool could help a hedge fund make instantaneous decisions.