Big Data: What’s the Hadoopla all about?

Big DataThe latest buzzword dominating business headlines today is “Big Data”. What was once known in past incarnations as Data Processing, shifted to become Data Mining, then transforming into Business Intelligence, and now has finally adopted the name “Big Data”. It’s still unclear to many business managers what Big Data is. A lot of technology managers have a hard time explaining it. All this makes it hard to make the value it promises tangible enough to invest in. The core of the problem is that “Big Data” isn’t just one thing. It’s an umbrella category for a series of technologies and methodologies that have made dramatic strides in innovation and business application, with a core goal of leveraging data to create dramatic and sustainable competitive advantages.[more…]

I’ll set a baseline definition for each of the four main categories implied by the term “Big Data”: data sourcing; data storage and staging; analytics and insight creation; new business models. This should give a non-technology manager a basic understanding that could get them through a cocktail party.

Data Sourcing

This is simply identifying, or in some cases creating, sources of information that can provide, alone or integrated with other data, new insights for a business. New technologies have emerged that make it much easier to source information that wasn’t easy to capture before, like customer interactions. Other technologies have created new fountains of data that businesses can tap into to gain additional insights, such as Twitter and GPS location information. The gap between what the optimal or “dream” data universe has to look like to create a sustainable competitive advantage versus what is currently available to a business is quickly closing. Managers should think big about what information would be game-changers and challenge their technology groups to find it and capture it. In previous posts I’ve recommend a book by Douglas Hubbard called How to Measure Anything which provides a fantastic framework.

Data Storage & Staging

A radical drop in the cost of storage has made the warehousing of large amounts of data a manageable economic proposition for most firms. We now talk about capturing and storing terabytes and petabytes of information. More importantly, new technologies, like the much hyped Hadoop framework, can pull data from clusters of data servers in an extremely efficient manner which solves two major headaches: 1) stored data can be quickly extracted for analysis, in some cases near real-time, and 2) fragmented data from multiple silos can be combined and analyzed together.  Firms like Cloudera and MapR provide cloud software and services that can help in developing the right structure. 

This new kind of infrastructure allows data to reside in large distributed networks that provide speed, perpetual access, and data redundancy, and has been labeled Cloud computing. Cloud services are offered by a growing group of firms that provide simple and scalable services. Two of the best known are Amazon and Rackspace.

Analytics & Insight Creation

What analytics you want to create largely depends on what data you have and what form it takes. An important feature of “Big Data” analytics is its flexibility with both structured and unstructured data. Traditional structured data is organized within set fields and in relational databases, while unstructured and semi-structured data require additional enrichment and organization to allow them to be useful for analysis and insight generation. An experienced analyst, for example, can create algorithms that ferret out useful items found in written text documents, then match them to other relevant data.

There are a variety of analytical tools in the market, most of them very rich with features. The level of sophistication of the tools, and subsequent modeling, used in any analysis depends on the complexity and volume of the data, as well as the kind of insights that are being sought. A common mistake is to procure overly complicated tools and build overly complicated models that don’t improve business outcomes. Managers need to define very clearly up front what kinds of results they want to get.

SAS and R are the standard-bearers for statistical analysis, and the go-to tools when you are looking to find correlations and testing causation in your data. The biggest bang, though, comes from an ability to create visualizations of your analysis, making it easier to digest, remember, communicate, and inspire action. A variety of vendors, such as Spotfire, Tableau, and QlikView, provide visually-driven Business Intelligence software that can be used to create intuitive analysis and dashboards used by managers or salespeople for decision making.

New Business Models

Putting a “Big Data” infrastructure in place at a firm won’t automatically ensure a return on investment. Businesses need to change in order to incorporate Big Data insights into their DNA. In some instances, completely new businesses models have emerged, like in the cases of Amazon and Netflix. In existing businesses though, a new data discipline needs to be put in place. It starts with management articulating clear goals and measures of success, followed by planning and execution of a Big Data plan that is in-line with company culture and talent, and lastly developing incentives and training that promote data reliance.


Despite the hype, Big Data really isn’t anything new. There’s been a renewed interest in the four principal areas that comprise it due to new innovations and impressive results of businesses that have fully adopted it. There is a tremendous opportunity to realize value using these tools, so it’s important to have a strong working knowledge of what they are and have it in the back of your mind as you think about your business strategy.

Tags: , , , , , , , , , , , , , , , ,

Leave a Reply