On Monday, IBM announced it will invest about $US300 million over the next few years and assign 3,500 people to help develop an up-and-coming technology known as Spark.
IBM called Spark “the most significant open source project of the next decade.”
This was very good news for a two-year-old startup called Databricks, founded by the people that invented Spark, and who, today, officially launched their commercial version of Spark.
Spark is a free and open source software program managed by the organisation that runs many open source projects, the Apache Foundation.
In the past year or so, it has become a phenom in the world of big data computing, where companies collect huge amounts of information, store it on low-cost commodity computers, and use a variety of free software to work with the data.
Spark has gained attention because it crunches through vast amounts of data super-fast. It won a contest in 2014 known the Grey Sort Benchmark, which measures how fast a system can sort 100 TB of data, or 1 trillion records — Spark took 23 minutes, smashing the previous record of 72 minutes. It is also popular because Spark can be used with other big-data technologies, especially the popular method of storing lots of data, Hadoop.
Spark basically replaces an older method of working with data stored in Hadoop invented by Google, known as MapReduce. Spark is not the only free and open source project that replaces MapReduce. Apache Storm is another.
But, as IBM’s huge commitment on Monday shows, Spark is becoming the rising star in this world. That’s because it crunches through data almost the instant that data is collected. And it makes it easy for developers to write apps that take advantage of real-time big data analysis.
Why IBM is jumping in with both feet
Spark was started in 2009 by Matei Zaharia, now CTO of Databricks, as part of his PhD and developed as part of the UC Berkeley’s AMPLab. Databricks’ CEO is Ion Stoica, a UC Berkeley professor and co-director of AMPLab.
AMPLab is a research center for computer algorithms and machine learning. IBM is one of the four founding members of AMPLab, although a long list of other big names in computer science contribute to the lab, too. (The other founders are Amazon Web Services, Google, and SAP).
Interestingly, Spark won its famous speed contest by using Amazon Web Services, IBM’s nemesis cloud competitor. And, on Monday, Databricks announced that its commercial version of Spark was officially open for business, also on Amazon’s cloud.
Clearly, IBM is not willing to let AWS run away with Spark.
On the contrary: IBM will be offering Spark on its cloud service IBM Bluemix. It will also be baking Spark into its IBM’s Watson Health Cloud and developing a special machine-learning version of Spark known as IBM SystemML. IBM will release SystemML as a free-and-open language, too, working with Databricks to do so.
On top of that, IBM says it will train more than a million data scientists and data engineers on Spark through extensive partnerships with AMPLab and other online education outlets. And IBM will open a Spark Technology Center in San Francisco to support Spark/machine learning projects.
While IBM’s full-throttle commitment to Spark is enormous, it’s not the only big company doing this type of thing. In February, Intel announced a partnership with Databricks to get Spark to work better on machines that run on Intel processors.
Besides winning a contest, Spark has already been adopted by some big names including Airbnb, eBay, Groupon, MyFitnessPal, OpenTable, Pinterest, Independence Blue Cross of Philadelphia, and NASA, and the SETI Institute, to name a few.
Since the free version of the project first launched in 2009, more than 400 developers have contributed to Spark, and this week, the Spark community is holding a conference in San Francisco expected to attract 2,000 people, nearly double the number from last year.
All of this has helped Databricks land an all-star line-up of backers and board members. It raised $US47 million in two rounds led by Andreessen Horowitz (Ben Horowitz), with participation from New Enterprise Associates.
Its board also includes UC Berkeley professor Scott Shenker, known as the co-founder and former CEO of Nicira, which sold for $US1.26 billion to VMware in 2012.