Photo: Flickr/Laughing Squid
Big data is literally changing what computers can do.You are already benefiting because big data brought you Google.
And we’ve only just begun. Big data is changing things for three reasons:
- It can handle massive amounts of information in all sorts of formats — tweets, posts, e-mails, documents, audio, video, whatever.
- It works fast — practically instantly.
- It is affordable because it uses ordinary, low-cost hardware.
Big data solves problems for companies like eBay, Facebook, LinkedIn, Netflix, Twitter and Zynga. But it is also allowing completely new types of companies to be built.
Big data is not really a new technology, but a term used for a handful of technologies. While some of these technologies have been around for a decade or more, a lot of pieces are coming together to make big data the hot thing for 2012.
Pat Gelsinger, COO of storage giant EMC says that big data is already a $70 billion market and growing at 15-20 per cent per year.
Every major enterprise tech company is interested, and investing heavily, in big data products and services. This includes IBM, Oracle, EMC, HP, Dell, SGI, Hitachi, Yahoo ... and the list goes on.
With so many deep pockets around, VCs are tripping over themselves to find and fund big data startups. Accel launched a $100 million Big Data fund in November and IA Ventures a $100 million fund earlier this month.
Everything about big data is big: the potential market, the number of companies, the funding some of these early startups are getting.
So it's no surprise that big data companies are also stealing away some of the most talented engineers in the Valley. Engineers from Google, Facebook, Yahoo, who could have their pick of jobs, are lining up for big data startups such as Cloudera, Hortonworks, MapR.
Big data is happening now because other technologies are fueling it:
- The cloud gives everyone affordable access to a massive amount of computational power and to loads of storage. You don't have to buy a mainframe and a data centre. You pay for what you use.
- Social media means everyone is creating interesting data as well as consuming it.
- Smartphones with GPS offer lots of new insights into what people are doing and where.
- Broadband wireless networks means that people can stay connected even as they travel.
Big data is a term for a collection of technologies that are sometimes used together, sometimes not. These technologies are:
- In-memory databases
- NoSQL databases
Analytics means sifting through large streams of data in realtime to come up with patterns or answers to questions. 'People are starting to think what can I do now with the cloud? With big data and analytics, I can get insights,' explains Lauren States, vice president and CTO of Cloud Computing at IBM.
States offers the example of the Australian Open tennis championships. They've built an analytics engine called Slam Tracker, hosted on IBM's cloud. Slam Tracker has gathered 39 million statistics from five years of matches. It finds patterns in what players are doing when they win.
Big data analytics often uses in-memory databases that can process loads of data as fast as a computer system records it.
This can, for example, analyse all the stuff being bought at a grocery store chain nationwide as the purchases are being made to find patterns, or to offer customers instant rewards.
NoSQL has sometimes been called the cloud database.
NoSQL helps websites and interactive applications store tons of data about millions of users and grow to immense sizes fast because it spreads itself across low-cost servers and storage disks. It powers Web applications for companies like Zynga, AOL, Cisco and many others.
Regular databases need data to be organised. Names and account numbers need to be structured and labelled. But noSQL doesn't care about that. It can work with all kinds of documents.
It also doesn't suffer from performance problems if a lot of people start pounding on it at once. For instance, if 10 million people log on to play a Zynga game, it simply spreads itself across more servers and chugs along, the same as if 10,000 people were playing.
'When you throw away the rigid structure a database imposes, there's a whole lot of new things you can do with the data,' explains Damien Katz one of the founders of noSQL technology and CTO of noSQL company Couchbase.
Take the stuff that real-time analytics does, add in the stuff that noSQL does, and you've got Hadoop.
Hadoop is a set of technologies that can scoop up vast amounts of data and analyse it, in real time, using cheap hardware.
'There is a great deal of value that comes from analysing data that is seemingly unrelated or unorganized, as it may highlight new patterns or behaviours, Banks may use Hadoop for fraud detection, while online shopping services use it to fine-tune their customers' shopping experiences based on their buying patterns,
Hadoop is leading to never-before-possible applications.
Skybox Imaging is one example. This company takes satellite images and sells custom reports in real time, such as how many parking spaces are available in a city, or how many ships are at ports worldwide.
Without Hadoop it would be impossible to analyse the content of giant streams of satellite pictures, immediately and cheaply.
Hadoop is based on technology created by Google. But it was built by Yahoo.
Google published two key papers, one in 2004 on a technology called MapReduce which explained how to do computation across many computers; another in 2003 on the Google File System on how to store data on many servers.
An engineer at Yahoo, Doug Cutting, read the papers and built Hadoop. He named it after his son's toy elephant.
Cutting recently left Yahoo to work for the biggest Hadoop startup, Cloudera. Other Hadoop startups include MapR, and Yahoo's own Hortonworks. But all the big IT vendors offer Hadoop, as products or through their clouds ... or they will soon.
Most big data technologies are open source projects and can be had for free.
But it takes a lot of know-how to use them. Most IT organisations don't have the knowledge to build Hadoop applications from scratch.
And they don't have to.
Every major IT company is building products and services to help enterprises take advantage of Hadoop. Plus a stream of startups are showing up as well. From these companies the next crop of Google-sized companies will be born.