One of the most intriguing areas in artificial intelligence research is computer vision. From being an integral part of self-driving cars to allowing machines to guess your age, making it possible for software to see is a big deal.
Computer scientist Stephen Wolfram has released a new tool, the Wolfram Image Identification Project, that allows users to upload or link to an image and then see how well the computer can recognise what’s going on in the picture.
In a blog post, Wolfram describes the underlying technology behind the project. Like many computer vision programs, Wolfram’s project is built around an “artificial neural network”: a software framework inspired by biological brains that excels at the kind of pattern recognition needed for computer vision. In Wolfram’s case, the neural network was “trained” by being exposed to tens of millions of labelled images. As Wolfram puts it in the blog post,
“We don’t have any intrinsic way to describe an object like a chair. All we can do is just give lots of examples of chairs, and effectively say, ‘Anything that looks like one of these we want to identify as a chair.’ So in effect we want images that are ‘close’ to our examples of chairs to map to the name ‘chair’, and others not to.
We decided to try the algorithm out on a few images that were on the front page of Business Insider around 3:30 PM eastern time Tuesday afternoon.
In many cases, the image identifier was able to at least get the overall gist of the pictures. It classified the Twin Peaks restaurant in Texas that was the site of a grisly shootout between rival biker gangs as a “store”:
It also correctly classified Hillary Clinton and Marissa Mayer as “people”, although it wasn’t able to identify them specifically by name:
The algorithm also correctly, if vaguely, identified Paris cafe Le Comptoir as a building:
In a few situations, the algorithm completely ignored the people in an image, instead focusing on particular inanimate objects. Rather than noticing boxer Gennady Golovkin, the algorithm locked on to the glove on the boxer’s hand, helpfully pulling up some extra info on boxing gloves:
Similarly, in this still from an upcoming KFC commercial, the algorithm ignored former “Saturday Night Live” actor Darrell Hammond’s portrayal of Colonel Sanders and instead noticed the cars around him, identifying them as “transport”:
In other cases, the algorithm got temptingly close but was just slightly off. It classified this Samsung smartphone as a “remote control,” and as with the boxing glove, gave us some context:
On the subject of Tesla, the image identifier correctly noted that Tesla Motors CEO Elon Musk was standing in front of a car, but misclassified the car as a two-door coupe, rather than a four-door sedan. Still, pretty impressive:
Some images completely threw the algorithm off. The grey background and dark chyron on this NFL Network screenshot appear to have convinced the image classifier that New England Patriots owner Robert Kraft is in fact a clapperboard:
The algorithm also had trouble with more abstract items. The Yo app logo was parsed as “instrumentation”:
And this screenshot of leaked footage from the upcoming video game “Doom 4” showing a soldier in a desolate wasteland was interpreted as a “spider”:
While image recognition and classification are hard, and the algorithm is still a work in progress, it is fun to play with. Read more about the technology behind the app on Wolfram’s blog here, or test it out with your own pictures here.