Tag Archives: datasets

“Big Data”: new frontier or black hole?

Stéphane Richard (CEO of Orange) at the Avignon Forum in 2012: “Big Data, this is private data business, and it is scaring” (in French: “Le Big Data, c’est le commerce des données personnelles et c’est effrayant”)

In my eyes, there is much more than threat, when thinking of the future of Big Data. First, let us ask the relevant question: what is the definition of “Big Data”? A few hints, picked over the web:

on Wikipedia, the definition tells that ” Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” Simple.

Too simple. As mentioned on Mike2.0 wiki, “not all large datasets are big”. Certainly, some small datasets may be considered “big”, as they are complex, and some large datasets are not “big”, as their structure is well-known and easy to handle. But still, complexity may be different, from one point of view to the other (for my part, I do consider mobile internet data as “big”, whereas Mike2.0 only consider them “large”)… For reference, the link is here: http://mike2.openmethodology.org/wiki/Big_Data_Definition

More elaborated, Gartner’s definition (from their IT Glossary) says that “Big Data in general is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”. Similarly, IBM says that “Big Data spans four dimensions: Volume, Velocity, Variety and Veracity”. the 4 V’s. Somewhat of a marketing gimmick, but not so bad, after all…

When looking into definitions that are more Digital-Analytics-oriented, I will stay with Avinash Kaushik’s definition: “Big Data is the collection of massive databases of structured and unstructured data”. So basically, a promise for some bright analytics, but that will be hard to find, a classical needle in the haystack, or more exactly, a jewel among tons of rocks.

My own definition will then be a bit more provocative: “Big Data is a set of data, that is too big for any actual processing capacity”. Let me elaborate.

From the start, my career has lied mostly with Retail Tracking Data usage. In this area, bimonthly manual collection of purchases and inventories used to be the norm at the end of the eighties. And then came the first Big Data rupture, e.g. the introduction of weekly electronic data collection, using EAN/UPC-based datasets. 1,000 times more data points. Big data for the early nineties standard. Small beer twenty years later.

Similarly, when the same weekly electronic data collection – still based on samples – switched to daily census at the end of the nineties, data volumes multiplied then again by more than factor 100. Big data again. Now common for any Retail Panel service.

Again, when the same data collections were made available as transactional data records, showing all possible disaggregated data points – especially thanks to the upsurge of retailer fidelity cards – data volumes were again multiplied by factor 1,000. Big data another time. Now about to be handled more or less properly by Market Research companies. Awaiting the next big data frontier?

So definitely, data that are named “big” today are on the edge of our current ability to handle such data sets. Tomorrow, other data sets will overtake this “big” status, maybe with the addition of geo-location information or other causal data (digital journey for instance).

Ever more data for ever more information. Or the new frontier that leads to the black hole. Why? Because too much data may mean too much insights.

That is the drawback of big data. Too much data. Too many interesting things. Too many insights. The black hole of information, fully absorbing our capacity to catch new trends and key insights.

The bigger the data, the more complicated it is to extract the key information that will trigger new ideas, new business, new revenues. As mentioned in this blog post from Mediapost (http://www.mediapost.com/publications/article/191088/are-insights-cheap-commodities.html#axzz2IR4TeXCz), the key issue is not any more to find an insight, it is to find THE insight. We are not to break the next frontier any more, we are to find out in which direction we ought to search where to go.

We have to do this. Quickly. Before the black hole of big data swallows what remains of the dimming light of key insights…

So, to close this blog post, and to start the discussion, a very interesting point of view by Jannis Kallinikos from the LSE: http://www.paristechreview.com/2012/11/16/allure-big-data/