Tag Archives: big data

Free your data: revoke the precautionary principle!

People who know me are aware that I often complain that applying blindly the so-called “precautionary principle” is leading to inaction.

However, fear does not prevent danger, so tells a popular French saying. Similarly, data should not prevent any decision making.

Word Cloud Precautionary PrincipleIn a paper about the precautionary principle (in French), Gaspard Koenig states that “the precautionary principle central idea, and its most important drawback, lies in the will to avoid any uncertainty whatsoever. Then, when in doubt, forbear.

I do agree. And the ultimate paradox lies in the fact that the more data are to be handled, the more parameters are to be tuned, the less certain the decision is. Big Data leads our leaders not to decide anything anymore. The data overflow paralyzes.

Therefore, the automatic usage of the precautionary principle has to be suppressed, especially when it comes to institutionalizing it (let us not talk here of its addition to the French Constitution, certainly the most astonishing legislative act one has witnessed in the past decades).

Let us investigate the GMO example (also mentioned by Gaspard Koenig in his paper). Many studies and tons of data, most of them rather contradictory, are implying that GMO’s could represent a threat in the end, in 20, 30 or 50 years from now, either through a wave of cancers, or some alteration of our genetic material. Maybe. Until then, thanks to GMO’s, millions of people could have been fed better (or even fed at all), but instead they will starve. To death. So, do we act according to our conscience or do we let data validate our paralysis?

Beyond the comfort of data-backed analysis lies the necessary power of decision-making. Data may only sustain decision processes, whereas making the decision is a privilege of mankind. Besides, in the years when massive data processing systems have emerged (in the eighties), were we not speaking of “Decision Support Systems”?

Hence, one must rely on data, but without hiding behind a stack of them. It must be clear that data sets, even “big” ones, always harbor a part of uncertainty, not about today [we are absolutely sure that using GMO’s at a worldwide scale would reduce global hunger], but about tomorrow [as GMO’s may generate risks for health and environment]. Why? Because even the most refined predictive model based upon Big Data will never reach 100% reliability ever.

the signal and the noiseAnd even Nate Silver, the half-god of predictive models in the US (see a slightly ironical portrait in French, here) starts his cult book – “The Signal and the Noise” – by a foreword basically telling the reader that”the more data, the more problems” there are…

Therefore, people in charge have to take a risk, whatever its height. Give up the sacred precaution. And this to everyone’s benefit, since taking a risk is the only way to open breaches, to make a breakthrough. Thinking about it, with the precautionary principle, the Apollo XI moon landing would never have happened…

So, say yes to Big Data for the Blitzkrieg, and no to the Maginot Line of the precautionary principle. Or, with a balanced point of view, say yes to the D-Day, and no to the Atlantic Wall.

Your data must give rise to movement, not to motionlessness, to action, not to dejection, must help conquer new grounds, not defend one’s turf.

Break the WallYou have data, that is for sure. You want to take action, that is most probable. So, do not hesitate, have your data elicited, so as to break the wall and take the most enlightened decisions!

 

[French version: Libérez vos données: révoquez le principe de précaution!]

“Big Data”: new frontier or black hole?

Stéphane Richard (CEO of Orange) at the Avignon Forum in 2012: “Big Data, this is private data business, and it is scaring” (in French: “Le Big Data, c’est le commerce des données personnelles et c’est effrayant”)

In my eyes, there is much more than threat, when thinking of the future of Big Data. First, let us ask the relevant question: what is the definition of “Big Data”? A few hints, picked over the web:

on Wikipedia, the definition tells that ” Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” Simple.

Too simple. As mentioned on Mike2.0 wiki, “not all large datasets are big”. Certainly, some small datasets may be considered “big”, as they are complex, and some large datasets are not “big”, as their structure is well-known and easy to handle. But still, complexity may be different, from one point of view to the other (for my part, I do consider mobile internet data as “big”, whereas Mike2.0 only consider them “large”)… For reference, the link is here: http://mike2.openmethodology.org/wiki/Big_Data_Definition

More elaborated, Gartner’s definition (from their IT Glossary) says that “Big Data in general is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”. Similarly, IBM says that “Big Data spans four dimensions: Volume, Velocity, Variety and Veracity”. the 4 V’s. Somewhat of a marketing gimmick, but not so bad, after all…

When looking into definitions that are more Digital-Analytics-oriented, I will stay with Avinash Kaushik’s definition: “Big Data is the collection of massive databases of structured and unstructured data”. So basically, a promise for some bright analytics, but that will be hard to find, a classical needle in the haystack, or more exactly, a jewel among tons of rocks.

My own definition will then be a bit more provocative: “Big Data is a set of data, that is too big for any actual processing capacity”. Let me elaborate.

From the start, my career has lied mostly with Retail Tracking Data usage. In this area, bimonthly manual collection of purchases and inventories used to be the norm at the end of the eighties. And then came the first Big Data rupture, e.g. the introduction of weekly electronic data collection, using EAN/UPC-based datasets. 1,000 times more data points. Big data for the early nineties standard. Small beer twenty years later.

Similarly, when the same weekly electronic data collection – still based on samples – switched to daily census at the end of the nineties, data volumes multiplied then again by more than factor 100. Big data again. Now common for any Retail Panel service.

Again, when the same data collections were made available as transactional data records, showing all possible disaggregated data points – especially thanks to the upsurge of retailer fidelity cards – data volumes were again multiplied by factor 1,000. Big data another time. Now about to be handled more or less properly by Market Research companies. Awaiting the next big data frontier?

So definitely, data that are named “big” today are on the edge of our current ability to handle such data sets. Tomorrow, other data sets will overtake this “big” status, maybe with the addition of geo-location information or other causal data (digital journey for instance).

Ever more data for ever more information. Or the new frontier that leads to the black hole. Why? Because too much data may mean too much insights.

That is the drawback of big data. Too much data. Too many interesting things. Too many insights. The black hole of information, fully absorbing our capacity to catch new trends and key insights.

The bigger the data, the more complicated it is to extract the key information that will trigger new ideas, new business, new revenues. As mentioned in this blog post from Mediapost (http://www.mediapost.com/publications/article/191088/are-insights-cheap-commodities.html#axzz2IR4TeXCz), the key issue is not any more to find an insight, it is to find THE insight. We are not to break the next frontier any more, we are to find out in which direction we ought to search where to go.

We have to do this. Quickly. Before the black hole of big data swallows what remains of the dimming light of key insights…

So, to close this blog post, and to start the discussion, a very interesting point of view by Jannis Kallinikos from the LSE: http://www.paristechreview.com/2012/11/16/allure-big-data/

Brave New World…

Hello Brave New World!

It will not be my goal to comment Aldous Huxley’s book in this blog, even though I’d love it, but rather to comment on what has been my core business for the past 20 years: data elicitation.

With the constant growth of the online usage, the world has been processing ever more data than ever; such names as “big data”, “cloud computing”, “social media”, “web analytics” are known to nearly every man-in-the-street. A new world that did not exist a few years ago, and that was known only to happy few up until a few months ago.

A new world; Fine. Why brave? Actually, these words sound like a Terra Incognita, as nobody really has an answer on how to handle them: fear of the gigantic amounts of data points, questions on where to start with these data, lack of operational tools and schemes, the void is near…

So we have to be brave; or this new world will be brave in Aldous Huxley’s way, creating a data hell that only robots and maybe a few dominating companies will be able to handle.

Of course, I do not pretend I have THE answer, or I would not be writing this blog, as I would rather be counting my billions on a Caribbean beach 🙂

But I have experienced several data revolutions through my Market Research experience, from the introduction of Bar Codes for FMCG sales tracking, to Point-of-Sales single data collection and now to my current task on understanding how to cope with the vastness of the internet data…

This blog aims at sharing this experience, and I hope I can bring some tips to whoever is facing the challenge to use such masses of data. I have named this specific task “Data Elicitation”.

So let us elicit the data of this Brave New World!