Photo: Jakob E via flickr
With transactional data volumes reaching billions a day, Big Data is a big concern for business. Thus, Big Data is also big business for technology providers working to address the associated requirements, costs and long-term ramifications of the data explosion/deluge. There’s an entirely new market building rapidly around Big Data to address challenges like capture, storage, search, sharing, analytics and visualising, among others.A May 2011 McKinsey Global Institute report estimated that nearly all sectors in the U.S. economy had at least an average of 200 terabytes of stored data by 2009 – and that was two years ago and already twice the size of Walmart’s data warehouse in 1999 – per company with more than 1,000 employees. How quickly data volumes have moved beyond gigabytes to terabytes, and they will only continue to increase.
Such large volumes of data used to be generated predominately as a result of human-driven interaction: texting, online retail purchases, stock trades and so on. But in the new digital age, more and more data is now machine-generated. It is this machine-generated data, ranging from Call Detail Records (CDRs) and automated stock trades to smart meter sensors, security monitoring appliances, and test and measurement devices, that is expected to form the bulk of data growth into the future.
Is business prepared to manage the load and cost of Big Data, not to mention staying compliant?
Just how big is Big? To put it into perspective: one exabyte is 1,000 petabytes, or 1,000,000 terabytes of data. To date, most enterprise data discussions have been capped in terabytes, but not for long. A recent report from Cisco stated that global mobile data traffic will increase 26-fold between 2010 and 2015. Mobile traffic alone is forecast to surpass six exabytes a month in 2015. By this time, queries will run across petabyte-size, trillion record data sets, requiring greater scale to handle new volumes of machine-generated data volumes.
That means within three years, the conversation will be around exabytes of data – terabytes giving way to petabytes, and then to exabytes within the short span of three years. There’s a reason why Oracle named their latest data warehouse appliance Exadata!
The proliferation of mobile devices around the world, and regulatory compliance requirements for storing and maintaining access to CDRs and wireless access protocol (WAP) logs, puts the telecommunications industry at the forefront of the Big Data wave.
And while data volumes balloon, business and legal regulations tighten for retention of mobile data. In the U.S., for example, mobile data must be kept two or three years, while in other parts of the world, the data must be retained for seven years (The Middle East), 10 years (India) or more (Japan).
But the communications sector is not alone. The needs for Big Data retention and management are popping up in other areas such as utilities with smart meters and cyber security with network logs, all with one thread in common: massively big machine-generated data in a highly regulated industry.
A May 26, 2011, article, “Building with Big Data,” appearing in the Economist stated:
Last year people stored enough data to fill 60,000 Libraries of Congress. The world’s 4 billion mobile-phone users (12% of whom own smartphones) have turned themselves into data-streams. YouTube claims to receive 24 hours of video every minute. Manufacturers have embedded 30m sensors into their products, converting mute bits of metal into data-generating nodes in the internet of things. The number of smartphones is increasing by 20% a year and the number of sensors by 30%.
The sheer volume and speed with which we have entered the Big Data era has become an IT and business challenge. For those without a strategy and infrastructure in place, it could be very costly and risky.
However, Big Data is also a big opportunity.
Whether driven by regulatory or competitive reasons, Big Data is big business, as evidenced by the amount of M&A activity in the analytics space alone. EMC acquired Greenplum, IBM acquired Netezza, and HP acquired Vertica. While these and other solutions have succeeded in driving down the cost of complex deep analytics from “enterprise-premium” pricing versus traditional OLTP, RDBMS and data warehousing systems, they aren’t optimised for long-term cost-effective retention of massive data volumes. Keeping a close eye on the storage, hardware and administrative costs of growing databases is a very real problem.
Let’s just examine the economics of retaining large volumes of machine-generated data. The capital expenditure (CAPEX) of the hardware and software investment needed to begin to ingest and eventually query the data is what captures everyone’s initial attention. This problem is largely being addressed by many RDBMS and data warehouse solutions and appliances, which are typically measured and priced per terabyte stored.
But ongoing operating expenditure (OPEX) is what drives the overall total cost of maintaining any system put in place. The dimensions that make up this total CAPEX and OPEX equation include the servers to support the throughput of load and query of the data, the amount of physical storage needed and the class of storage. Efficiencies in compressing, storing and retrieving such large volumes of data dictate this equation in the end – from reducing the space needed, (human) administration and skills, to what class of hardware and storage is required.
Making the Big Data problem smaller
Businesses that are addressing the Big Data problem in a more holistic manner are not only examining the analytics requirements but also the cost of the overall data retention, which might be measured in terabytes and potentially petabytes, and very quickly exabytes as the volumes continue to mushroom. The economics are changing for new classes of data management, and regulation will only get tighter.
All businesses, but especially those in highly regulated industries, need to prepare for Big Data in the same way they prepare for major enterprise initiatives and transformative IT projects: create a strategy, be realistic, plan for the worst, bring in experts and don’t be afraid to spend now for long-term savings. In the long run, having sound strategy and infrastructure in place for Big Data will be what gives the most successful businesses an edge.