There’s so much hype around big data today, so often a lot of time is spent debunking myths and keeping people focused on what works.
With that in mind, here are some tips for getting your big data project under way and keeping it on track.
1. Find the problem that never gets solved: We are often approached about where to start in big data. A good beginning is to talk to your senior executives. What are their goals and what is inhibiting them? Is there a key business problem that just never gets solved? In many cases, new, big data insights can help overcome previous barriers. Some of the most successful big data success stories start there. Why? It is because they are mission critical from day one.
To give examples, for investment banking the problem could be about how to get better qualitative knowledge to support funding decisions, for retailers it could be about getting to know consumers more intimately to support micro-segmentation and for manufacturers, about how to more quickly spot and reduce product defects.
2. It’s not one giant leap: Within big data analysis there’s often an enrichment process. Big data is not a product you can buy; it is knowledge that needs accumulating, from numerous different data sources and different approaches. Therefore, big data projects tend to comprise thousands of baby steps to get to the end result.
For example, when micro-segmenting customers against particular promotions, it takes time to see what works and often more importantly, what doesn’t.
If something is going to take longer than three months, put milestones in place to keep senior executives informed. There’s a lot of power in being willing to set up your own evaluation based on their success criteria at the outset. This is particularly important at the start of a company’s big data journey. Often, at this stage a lot of the people involved don’t know much about the technology they’re using, so it’s also a learning experience. There’s only a limited amount that companies are willing to pay for somebody to learn about something without results being seen on the way.
3. Agile big data requires building credibility: We often talk about big data in terms of discovering things that nobody knew before, but these discoveries usually need to have a basis in what intuition tells us. Why? For really unusual discoveries to be validated, people have to trust them. The only way they will trust them is if you first follow a path that aligns with things they kind of knew in the first place.
So for example, a retail basket analysis might bring an epiphany on milk sales to the untrained eye if it wasn’t realised that this product is discounted all the time in that store. But by validating this intuitively known insight at the start, it helps develop extreme credibility for when, later on down the track, you have those unique discoveries that aren’t known or expected.
4. New discoveries are only interesting if they can be applied: You can monitor social media to understand how people feel about products and services but the importance of these findings will vary depending on who the actual person is, for example, can you show whether they are a customer or a potential customer? If you can’t figure that out, then how much does the information help you? This type of correlation becomes particularly important if you are looking to monetise data.
For example, a company holding information on property sales and foreclosures could monetise that data if they have the means to take data from a large lender and compare it to see if there are any liens against any of the properties they hold the mortgage against.
5. Don’t forget the basics when it comes to predictive models: When trying to check the value of information sources, and then spot and validate trends, it helps to go back to basics and use predictive models and control groups against a small sub-set of the data to validate the concept before you go at it on a large scale. In addition, a lot of data quality issues can be worked around using predictive models. You can use them to fill in the blanks for predictions or data quality. However, be sure you keep track of real versus predicted facts.
6. Measure success in business results not IT cost savings: If you look at the balance sheet of most companies, IT expenditure is typically a small percentage of overall revenue. So, while costs savings can be significant, in terms of overall revenue, they can be considered small against the potential revenue increases that could be gained from solving some of the biggest business problems.
Look at the retail industry, for example. Its biggest costs are typically associated with inventory and stock movement, not IT. Savings in those areas can trump potential IT cost savings many times over.
7. Big data doesn’t fit on your laptop: Problems that fit on your laptop are usually solvable with the company’s current IT infrastructure. There’s not much advantage in tackling those problems with big data. It’s the problems that are too big – that stress your IT infrastructure – that you should be looking to tackle.
For example, the utilities sector is faced by the challenge that it has had to go from handling the data from once a month or quarter meter readings to coping with the much greater information flow from smart meters. On top of that these organisations are keeping this data for much longer. They don’t have the option to just extend their current technology, they need more.
8. Don’t try to replace your entire IT infrastructure: That said, people get enamoured with big data as a new shiny object that’s based on low-cost hardware and software. It is not uncommon for them to look at all of these other big systems doing things like online processing, payroll and inventory, and think they can replace all of those as part of a big data effort and be like Google or Yahoo and run huge parts of your infrastructure on open-source commodity hardware? In 100 per cent of the cases I have seen companies try to do this, the approach always fails; in fact they don’t fail, they can’t even get started.
You can’t justify replacing a working system without a valid, measurable reason for doing it. There is a reason to take certain pieces of your problem set and attack them intelligently with different tools.
9. If it looks too hard, you’re using the wrong tools. Java is a very popular tool but certain problems require an awful lot of Java code. Don’t be afraid to consider alternative languages or GUIs. For example, if a problem is related to data access, there are SQL derivatives that would improve productivity. If it’s statistical, there are lots of languages, such as R, and user interfaces that could take you from hundreds of thousands of lines of Java code down to 10 in a different language.
10. Use the cloud. Cloud and big data often come close together. Often, in many cases it is activity in the cloud that starts to generate information that feeds to big data analysis. The cloud can also be a useful place to start when going down the big data path, either to get a simple project up and running very quickly, or for proof of concept activity.
In the end, you need to take into account where the data will be. In the future, this will likely be a hybrid of both private, local big data and cloud based sources.
So if you haven’t started yet, just begin collecting data, traditional sources of data, untraditional sources of data, data outside and inside of your organisation, and just start to see if there is some insight there that is interesting to your business. If you are already on your big data journey, keep taking those baby steps.
Alan Manewitz is Oracle’s senior director of information architecture and big data, and Vicky Falconer is the company’s big data solution specialist.