Privacy, Safe Harbor and ad blockers: When trust is key to business

Wild net neutrality vs. regulated standardized networks? Privacy for each vs. security for all? Free content with ads vs. pay-per-view? Are we bound to witness the development of the internet with the eyes of a war correspondent?

A few weeks ago, I have been at MeasureCamp in London, a bi-yearly analytics unconference that I find particularly stimulating (more about MeasureCamp here). I have held a session titled “More #Data, less #Privacy: Are we bound to finish naked?”. A summary of the discussion may be found here.

Provocative though the title may have been, the discussion was very constructive, and, as no clear solution seemed to emerge, it ended up with the only possible option, name it common sense, good will, mutual confidence… As for me, I shall keep the word “trust”, especially throughout this post for a good understanding.

Credit: Oleg Dudko via

Trust is is the common ground to the topics that I have named in the title of this post:

  • Privacy policies cannot be operative unless you trust the company storing your data, and in many instances, you may even call for a third party to handle this subject, which usually is called “trusted third-party”… A hot topic everywhere, especially with all the data leaks that have occurred in the recent weeks.
  • Safe Harbor has been put into question, and ultimately terminated by the EU Court of Justice, because trust had been broken between the US and Europe, not only because of the NSA spying scandal, but also of the divergent points of view about what the role of the Governments should be and their right to intrude into the business rules (see the endless debates on Net Neutrality and Right-to-be-forgotten, for instance on the NNSquad site)
Credit: Robert Churchill via
  • Ad blockers also are starting to be an issue, because trust has been mishandled there too. Let us forget the original sin of third-party cookies, when accepting primary cookies from a website, for it to work properly, led to cookies being transferred to scores of third-party users, mostly for ad targeting purposes… My main concern today is about the latest bias in that area, i.e. ad blockers dealing with ad servers so as to generate exceptions for them, and let some ads be displayed, now breaching trust with their own clients.

Let me focus on this specific problem.

OK, ad blockers are standing in the way of advertisers, but so far they are not killing them. Advertising is the key monetisation tool for many free-to-access websites, among which media and news are in the first row, but many models show that such websites still may succeed nevertheless (the Huffington Post, as an example of a pure player, or the New York Times as an example of a redeployment from a paper business model). And, although we all agree that purely free content may only be amateur, plagiary, or infomercial, still ad blockers are not threatening free speech and democracy. And some people may require some rest from all-too intrusive ads, and block them. That’s legitimate.

Still, those same people have to behave accordingly, and understand that not everything may be free. Economy is give and take, debit and credit, buy and sell. These people have to pay for added-value content, beyond Linkedin Pulse summaries, or Mediapost digest e-mails. Pay for an abo at the NY Times on-line for instance. It balances the loss of revenue (ad-blocking) for some websites with paid-for content, for instance news read behind a paywall. This is fair. This are good business rules. This also is all about trust.

Trust is when both sides do their part; trust is when people using ad-blockers buy content somehow; trust is when an ad-blocking software reminds you that not everything may be free on the web, and that you have to pay for some content; trust is when ad-servers are serving relevant ads in relevant quantity, not flooding us until we surrender (or block).

Credit: Maksym Fesenko via

But trust is broken when one of the parties is not playing a fair game; trust is broken with the users when an ad-blocking companies sells, at a very expensive price, bypassing ways for some ad-servers. Trust is broken when little arrangements are made between ad-blockers and ad-servers behind the back of the flock of consumers.

I shall not elaborate more about ad-blockers and especially about the AdBlock deal (more details here). And though this ad-blocker is a tool I have recommended in the past (in this 2014 post, for instance: Data Privacy, between a rock and a hard place), trust is now broken. And breaking trust means losing clients. At least some clients. AdBlock, you lost my trust, you lost me.

Key trends for efficient analytics (1) : a web taxonomy

In the wake of the most recent MeasureCamp, held in London on March 14th, I have decided to start a series about key trends in analytics.

During the MeasureCamp, beyond the usual lot of (highly interesting) technical workshops, I have identified three trends that are increasingly important for digital practitioners, so as to improve the efficiency of analytics usage, be it for a better understanding of the clients or for a smoother experience of analytics within the organization.

These three trends are:

  1. using a taxonomy to ensure your website is performing correctly
  2. defining a balance between ethics and privacy, when coping with regulations
  3. drawing a path for your analysts to improve their engagement and their relevance

I shall tackle the first topic today, i.e. taxonomy.

taxonomy LinnéThis topic has been a constant interest in my career, namely when I came to work on digital data. I even participated in the development of a patent dealing with this subject (System and method for automated classification of web pages and domains). And I believe that it is of the highest importance to improve the way your data are collected, and hence the efficiency of your analytics.

To have a good analytics output, it is not enough to have a good Tag management, an efficient SEO strategy or a good budget for ad words (even though all three are blatantly necessary). The optimization of analytics starts with a sound website, aligned with your strategy, properly coded and suitably organized to answer key questions about your users and your customers.

There are two key factors of success, e.g. organizing the whole site in full accordance with your original objectives, and aligning the organization with the experience of the site users.

Aligning the site with the strategy is a blatant point. But not an easy one! The strategy may be altered often, when new products are launched, when fashion trends are evolving, when clients are upgrading (or downgrading) their expectations… But one seldom can afford to change the structure of the site all that often. And throughout time, your site may not be aligned with the company’s goals any more, at least not in all its parts.

The first reaction will be to run bunches of A/B testing to find local improvements, new layouts, better priorities, but this will be short-term fixes, while the tide moves slowly away…

Thinking ahead is definitely better on the long-term, and a proper taxonomy gives you the flexibility to be able to align your websites key measurement points with a volatile strategy. Why? Because a taxonomy is a frame, a container organizer, whereas working solely on the key words, on the “thesaurus” as Heather Hedden says, is a one-shot work.

And it never is too late. Your website does not need to be fully revamped, so as to comply with a taxonomy. You just have to re-think the organization of your internal data, and align key concepts with your SEO strategy, as is very well described in this blog post – “Developing a Website Taxonomy to Benefit SEO” – by Jonathan Ellins from Hallam Internet, a UK-based consultancy. I have inserted one of the graphs used within their post below, showing the kind of decision tree a taxonomy may be generating:

hedges-topicsThe interesting thing in this post is that it does not only focus on content, but also on what they call “intent” (I personally call it “context”), which opens the door to alternative ways of organizing one’s website data for improving analytics in the end.

This brings me to the second factor of success, considering the user experience.

Beyond intent, there is a much broader scope, i.e. how the user is experiencing the website, and the way he/she browses all the way to the eagerly wanted conversion. The usual way to handle properly such users, is to determine typical personae.

personaeThe personae are very useful here, as they not only show the various ways of navigating into the website, but also allow the identification of key crossings, loops and dead-ends, which are clearly signs of a website that is not aligned with the users’ expectations. And a flexible concept like taxonomy would offer the opportunity to alter the logical links between two pages, so as to modify the browsing in such a way that the user would find his way more easily.

In conclusion, it certainly is not easy to revamp one’s site on a regular basis, and not easier to change one’s data management system all too often. In this respect, a taxonomy applied to your website may offer enough flexibility to cope with this ever-changing world, so that you may provide ongoing sensible analytics to your stakeholders, even when they are all too often changing their moods…

Should you be interested to develop such a taxonomy, or at least to discuss about the relevance of such an enhancement, I would gladly be your man. I may be contacted here.

Next week, I shall discuss the impacts of ethics and privacy rules on analytics. Stay tuned!

PS : For those interested in a much deeper approach of the taxonomy concepts, I recommend Heather Hedden’s blog, “The Accidental Taxonomist“. Beware, not for rookies!

Data Strategy, high time to take action!

Way too many companies are still behind schedule, and have the utmost difficulties to set up such data strategies, especially in France. I have then decided to rely on my most significant successes from the past year to review and complete the operational scope of Data Elicitation.

The achievements :

  • The CWA, which makes me the only French data management expert who also is certified in digital analytics;
  • The success of the first Paris MeasureCamp, which I have been co-organizing, the biggest analytics event in France, which is due to happen again on June 27th 2015;
  • The patent validation process going on now at a European level, which validates even more my data management expertise;
  • Les multiples requests I have been addressing, from web analytics to strategic consulting, including text mining, data visualization and big data, many topics for many original inputs.

The restraints :

Still, working on data and digital policies has proved rather difficult; in fact, the restraints in implementing these strategies are clearly more structural than cyclical; McKinsey is summarizing the French situation in 4 items, in this study (only available in French…) published in the fall of 2014 :

  1. organisational issues, and namely the all-too famous vertical organization that I have reported here
  2. a lack of digital competencies
  3. a lack of financial leeway
  4. a lack of clear managerial involvement

The French State could certainly act more and better on two of these issues, education and business taxation. I shall develop more in detail the opportunities for public policies in the digital area in a future blog post.

The two other restraints that McKinsey have identified are more complex, as they are linked to the internal organization of the companies, as well as to their willingness to change.

In my eyes, the French companies have to overcome three biases, which harm their blooming in this data-driven world :

sujet négligeable

  1. Data (and digital) are very often second-class topics, which are handled after sales and financial issues of any kind, when there is time, so to say very rarely as a priority. The website? A necessary evil. Social networks? we have to be able to reach young people! Data management? Sure, we have a CRM. So many prejudicial and sweeping statements: data and digital are downgraded as cost centers, and absolutely ignored as growth drivers.désaccord dans le groupe
  2. Investing into a data strategy is often subject to collective decision, through a board or a project coordination, and seldom the will of a single person. Hence, as for most “collective” decisions, it often is the lowest bidder who wins, the most careful, the conservative. On top of this, the competition between various department, be they marketing, IT, finance or sales, generates a paralysis, where emulation would be required.information sous clef
    1. Finally, and that is a key subject, the various data owners still consider that exclusive information ownership grants them an additional share of power. What a mistake! at the very moment when an information is stored, it is losing all its value, as data only have a meaning as they are enriched by others and used for decision-making.

An example? Three departments, marketing, sales, finance. Three products, A, B and C. Marketing has done some research, and clearly A is the best product. Sales are positive, B is the best-seller. Finance has analyzed the ROI, and C definitely is the most profitable. So two options: the wild-goose chase, and then the quickest, or the most convincing one wins, or one share information in a transverse way, so as to ponder the best mix for the company. One certainly would wish the second option happened more often…

Wherever there are data, there should be first an analysis, then a decision process and eventually an assessment.

The outlooks :

These blocking points have led me to rethink what Data Elicitation core business should be in the short-term.

As a matter of fact, it is vain to try to convince some companies to work on their global data strategy, when they still are burdened by the restraints as depicted above, while they have not realized yet how large their potential could be. therefore, I have created some training modules, so as to make the concerned professionals aware of the necessity to think their data management in a transverse and global way.

You will then find on this website, under the header “training“, a description of modules dedicated to people training, into such topics as data management and analytics, both on the methodological level and through such concrete actions as database maintenance, data sourcing or quality assurance.

Or course, I shall keep on consulting at C-level and Executive levels, when they are willing to handle their most acute data strategy issues.

You know it all, now… Your comments and/or questions about those modules are highly welcome, as well as any suggested improvement.

Now, there only is one thing to do, e.g. share this blog post IMMODERATELY…

I hope to hear from you soon!

[cette note de blog en Français est ici]

Free your data: revoke the precautionary principle!

People who know me are aware that I often complain that applying blindly the so-called “precautionary principle” is leading to inaction.

However, fear does not prevent danger, so tells a popular French saying. Similarly, data should not prevent any decision making.

Word Cloud Precautionary PrincipleIn a paper about the precautionary principle (in French), Gaspard Koenig states that “the precautionary principle central idea, and its most important drawback, lies in the will to avoid any uncertainty whatsoever. Then, when in doubt, forbear.

I do agree. And the ultimate paradox lies in the fact that the more data are to be handled, the more parameters are to be tuned, the less certain the decision is. Big Data leads our leaders not to decide anything anymore. The data overflow paralyzes.

Therefore, the automatic usage of the precautionary principle has to be suppressed, especially when it comes to institutionalizing it (let us not talk here of its addition to the French Constitution, certainly the most astonishing legislative act one has witnessed in the past decades).

Let us investigate the GMO example (also mentioned by Gaspard Koenig in his paper). Many studies and tons of data, most of them rather contradictory, are implying that GMO’s could represent a threat in the end, in 20, 30 or 50 years from now, either through a wave of cancers, or some alteration of our genetic material. Maybe. Until then, thanks to GMO’s, millions of people could have been fed better (or even fed at all), but instead they will starve. To death. So, do we act according to our conscience or do we let data validate our paralysis?

Beyond the comfort of data-backed analysis lies the necessary power of decision-making. Data may only sustain decision processes, whereas making the decision is a privilege of mankind. Besides, in the years when massive data processing systems have emerged (in the eighties), were we not speaking of “Decision Support Systems”?

Hence, one must rely on data, but without hiding behind a stack of them. It must be clear that data sets, even “big” ones, always harbor a part of uncertainty, not about today [we are absolutely sure that using GMO’s at a worldwide scale would reduce global hunger], but about tomorrow [as GMO’s may generate risks for health and environment]. Why? Because even the most refined predictive model based upon Big Data will never reach 100% reliability ever.

the signal and the noiseAnd even Nate Silver, the half-god of predictive models in the US (see a slightly ironical portrait in French, here) starts his cult book – “The Signal and the Noise” – by a foreword basically telling the reader that”the more data, the more problems” there are…

Therefore, people in charge have to take a risk, whatever its height. Give up the sacred precaution. And this to everyone’s benefit, since taking a risk is the only way to open breaches, to make a breakthrough. Thinking about it, with the precautionary principle, the Apollo XI moon landing would never have happened…

So, say yes to Big Data for the Blitzkrieg, and no to the Maginot Line of the precautionary principle. Or, with a balanced point of view, say yes to the D-Day, and no to the Atlantic Wall.

Your data must give rise to movement, not to motionlessness, to action, not to dejection, must help conquer new grounds, not defend one’s turf.

Break the WallYou have data, that is for sure. You want to take action, that is most probable. So, do not hesitate, have your data elicited, so as to break the wall and take the most enlightened decisions!


[French version: Libérez vos données: révoquez le principe de précaution!]

Dynamic Clustering, a new ad targeting methodology

This article is a summary of my proposed methodology to targeting web users with relevant ads, but without intrusion, e.g. without third-party cookies. The original version, including a broader depiction of the context, is a longer post in French : “Le régime sans cookies, le nouvel âge du ciblage sur internet

Basically, as related in many articles, including previous posts on this blog (see for instance “Giving up cookies for a new internet… The third age of targeting is at your door“), the usage of third-party cookies is bond to dwindle, if not to disappear. Some solutions have already been submitted, such as fingerprinting (still intrusive though) or unique identifiers (but too much linked to the major existing internet companies).

So, we need a non-intrusive contextual targeting solution, which takes privacy protection into account. This is the core idea of my proposed solution, e.g. “dynamic clustering“.


How does it work?

  1. Based on ISP and/or Operator log files, browsing data will be collected and anonymized (for instance through a double-anonymization filter) so as to protect one’s user privacy; Anonymous
  2. Files will be cleaned (“noise-reduction” processes), and organized at various categorization levels, so as to generate multiple dimensions, all of them rich and flexible. This will allow to create “outlined profiles” for each unique anonymous user; Categorization
  3. Using these dimensions, clusters will be generated, made of users with similar usage behaviors, based on each advertiser’s hypothesis, creating hence an infinite number of target groups, whose volatility is an asset, as it will always cover the client issue of the given moment.

ClusterSo yes, the no-cookie diet is possible… And it goes along with a more virtuous targeting of the internet users…

Convinced by this new diet? Willing to collaborate to the recipe development? Let’s meet!

When the State sells your personal data…

…it is with the Privacy Authority blessing!

This case is typical of the current French system, but this may echo some concerns elsewhere, so I have made a summary of an original post in French, so as to trigger discussion…

Basically all French politicians are claiming they want to fight the Privacy breaches, especially when North-American internet companies are involved. Still, the findings are clear: the French State and its state-owned companies are doing exactly the same!

First, the rule: any form collecting any of your (personal) data is to offer to the users the ability to give or refuse consent, regarding any further use of their data. In France, this is clearly stated in a law, passed as early as 1978 (named “Data Processing and Freedom”, a full concept in itself…).

There are two possible questions (“opt-in” or agree, and “opt-out” or disagree) and two ways of offering the answer (“active” and “passive”); this implies four different ways of collecting any consent (see table below), and of course generates confusion.


The “passive” response mode is neither common, nor recommended in France (even if it is not forbidden, as far as I may know), but it is more often found on US websites. In this case, the check-box is already ticked, and the user’s answer is registered by default. To register another choice, one has to un-tick the check-box. Clearly this option may only be used online.

For paper forms, one may focus on the “active” mode, when a check-box has to be ticked. The two remaining options are:

  1. Active opt-in = the user agrees, by ticking one or several check-boxes, that his/her personal date may be stored, reused, transferred, sold to third parties. This is the most respectful mode for the user, as only active opt-in guarantees that the user has chosen to give away his/her data. But this is not the norm…
  2. active opt-out = the use disagrees that his/her personal data are used, still by ticking a check-box. This mode is the most commonly used in France, and the Privacy Authority (CNIL) implicitly endorses this behavior. On its website, namely in the Q&A section of their website (only available in French), the CNIL mentions that the user may “oppose” to personal data transfer to third-parties or “refuse” that such data be used for commercial purposes. they endorse the opt-out mode.

Of course, many user just forget to tick check-boxes (or worse, do not find them), and hence are included by default on files sold to third-parties, namely for business purposes. This may be understandable for private companies, but when it comes to the Government or to State-owned companies, this is more disputable!

I have covered two examples in the original French version of this post, taken from recent experiences, that I believe may not be of interest for non-French speaking people. My comments are backed up by solid material.

Hence, the French State collects – and resells – personal data from its citizens, while Google, Amazon or Facebook are blamed for doing the same… You mean “contradiction”? I say “opportunism”.

For a true personal data protection, one has to develop alternative targeting tools! “Fingerprinting” or “unique identifier”, are mentioned, but there also is a non-intrusive option, based on the user’s online behavior. I am working on it… Willing to know more? Stay tuned and come back next week on this blog!

[This post is a summary of a longer original version written in the French-speaking section of; the original version in French namely includes pics and explanations of two opt-out examples]

Data Elicitation in three steps (1/3): Data Patterning

In my previous post, I have outlined what Data Elicitation was about. I have introduced the three areas required for a proper eliciting process, e.g. Data Patterning, Data Enrichment, and Data Analytics. This post will deal with the first of these areas, “patterning“.

First, it is worth explaining why I chose this word. I have used a typical concept from the textile industry (or, to be a bit more ambitious from the “Haute Couture” world). In this area, a pattern is the intermediate stage between the designer’s sketches and the item production, a formalized plan enabling industrial planning. On one hand, it still is a concept, like the sketches, as it is purely paper. But on the other hand, it already is production, as it includes all the necessary information for implementing a full production process. This is what patterning is all about, allowing people’s ideas to become physical shapes.

Patterning data is an essential step for data management,as it allows to take stakeholder wishes and technical constraints into account, and prepare an optimal project and development planning.. No database is suitable, if not driven by clients’ needs and requests. No data analysis is relevant if not aligned with the previously agreed pattern.

A few key questions in this respect:

  • What are my data made of and, even more important, made for? → as one finds its way better when the ultimate goal is known…
  • How can data sets be best organized? → the proper content ought to be in the proper place
  • What content do I need to store to get the best out of my data? → since not every piece of information may be worth keeping
  • How may my data relations be best optimized? → data are more useful when they are properly linked and aligned

I have summarized a typical process in the table below, in three columns; a fashion analogy, some project management steps and a basic (softened) example inspired by one of my previous experiences in the mobile world:

Patterning steps

The parallel with the creative flow in the Fashion industry is strong, as it shows that the first half of the process (steps #1 to #4) is the true added-value to the whole content, the second half being more execution. It is clear that botching the patterning phase will impede the proper completion of the project. In the case above, the proposed solution could be summarized in a small chart, as the information laid in two fields of the provided log files.

Patterning chart

The two attributes (fields) that were present in the log files are marked here in blue.

The TAC (the first part of the IMEI code) may be directly used, as it relates to one device model; the master database is maintained by the GSMA, and is delivered to its members (including Telecom Operators).

The User Agent is more complicated, as it includes entangled information; parsing the User Agent will allow namely to identify browser, OS and type of connection that have been used. Still, it does not require additional information, only a good content analysis and a solid set of coding rules.

The combination of these four items creates a unique identifier, which is not specifically related to given users, but creates homogeneous groups, sharing similar technical conditions (hardware, software, network). Each group will then receive contents adapted to their specific conditions, thereby optimizing their browsing experience and consequently increasing engagement.

As this blog is aiming at a large public, I chose to keep a rather simple example. Of course, should the matter be more intricate, the skills I have built up over my years of experience in Data Management will even be more valuable. Feel free to ask more about patterning or other fields of Data Elicitation, I shall be glad to elaborate customized solutions for your business.

Data Elicitation in three steps (2/3): Data Enrichment

The second step of elicitation is enrichment. Once your data have been patterned, you have to design their look and feel. This is what data enrichment is about.

Again key questions in this area:

  • How do I qualify my data? → a categorization scheme is key to facilitate relevant data extraction for future analysis
  • What may be missing or on the contrary is uselessly kept? → choosing one’s data set is necessary for correct data acquisition
  • What type of addition is relevant and what is really useful? → as existing information may not be sufficient for achieving marketing studies
  • How do I acquire additional attributes at an optimum price? → collect, derive or generate additional data at proper cost

No question, your data are rich, especially if you can use them easily thanks to an appropriate patterning. But they certainly can be richer. Much richer. And there are hundreds of ways to enrich data, but only two dimensions to consider, quantity and quality.

You may have tons of data, and still this may not fit your purposes. Or on the contrary have scarce resources, but with a very high (and maybe hidden) value. Market Research companies used to name this data enrichment processing “coding the dictionary”, a phrase showing the richness of this process, both on the quantity (the number of words) and on the quality side (the clarity of the definitions). Getting the relevance out of the data is definitely a precious skill, and one of my own key proficiencies.

I shall definitely develop both aspects of data enrichment in future posts but I wanted to cover them shortly in this introduction.

1. Quantity

One always seems to be missing data. More e-mails for more direct marketing contacts, more socio-demographics for a better segmentation, more inputs from the sales force for a more precise CRM, more, more, more…

As usual, this may be true. Or not! Is Facebook the better source for reaching a specific population? Sure no. For instance, should you want to reach people affected by albinism in North America, you would probably rather get in touch with the NOAH. So, it depends on the purpose. And on your means to leverage a big amount of data.

Of course, I shall not dispute that a large database will give you more opportunities for reaching your targets. But better do it with the maximum level of quality. I shall then cover such topics as coverage, census vs. sample, long tail later on, as dealing with large databases is mostly a question of finding out the right data in the right timing.

2. Quality

A good quality is the heir of a proper patterning. And quality always is the key to an efficient database. The specificity of quality improvement also is that it implies all records, old and new. Unlike for quantity (adding new records is an ongoing task, you seldom look back on past data), quality always requires to give a look ahead AND behind. Adding a new feature, adjusting existing attributes to new constraints, redesigning existing concepts, all this implies a full database review.

I shall cover methods and tips for improving one’s database also in the future. Still the best piece of advice I can give is simple: think twice before starting. I have added below a simple example about the long tail of the internet.

Long Tail

This chart shows a top 1,000 websites, ranked on their visits for a given time period, using their share of visit. The metrics itself is not interesting, rather the data distribution.

The top 50 of the websites (5% of the total records), very well-known, will allow a fair coverage of the activity, e.g. more or less 60% of the visits. So for a small set of data, with a high level of recognition, we could have a good understanding of the activity. Good for global strategy and high-level analysis.

Still, on the other hand the bottom 500 websites account for more or less 1.5% of the visits. Too costly to reach for if you are on a global strategy level, but of the highest interest if you are searching for a niche or a specific type of target audience.

There is no point in balancing between a small database of high quality, and an extra-large one with a high sparsity. Again, the point is to have the database calibrated for your needs. And then enrich it wisely. And you know it by now, I have already told you: this is exactly what Data Elicitation is about!

Data Elicitation in three steps (3/3): Data Analytics

Today is wrap-up time. Before dealing with practical use cases and comment on data-related news, I shall conclude my introduction series with its third part,analytics. And now, you have your data set, well-organized (patterned) and trained (enriched), ready to go. You only need to find the proper tactics and strategy to reach your goal, i.e. get them to talk and find the solution to your issues or validate your assumptions.

How is this analytical work? Sometimes complicated. Often critical. Always consuming.

Let us first ask ourselves a few questions:

  • Is my software adapted and scaled for reaching my business goals? → needless to say, although… a nice database requires an efficient interface
  • What type of tools and techniques may I use for getting the best out of them? → data drill down and funnel analysis is not equal to random search
  • What are the patterns within my data? → how to reach global conclusions from exemplary data excerpts
  • By the way, do I know why I am storing so much data? → small data sets are often more useful than big (fat and unformed) data mounds

In fact, and even though one may build databases hundreds of ways, using very different tools and techniques, there are only two ways to analyze data, Mine Digging, and Stone Cutting.

1. Mine Digging

Mine Digging is the typical initial Big Data work. Barren ground, nothing valuable to be seen, there may be something worth digging, but you are not even sure… This is often what Big Data offers at first look. Actually, a well-seasoned researcher will find out if something interesting is hidden in the data, like an experienced geologist deciphering the ground would guess if any stone could be buried in there. Still, excavating the data to reach more promising levels is a lot of work, often named “drill down”, an interesting parallel to the mining work… And something will be found for sure, but more often stones of lesser value, a Koh-i-noor is not always there, unfortunately… This huge excavating work I have named Mine Digging.

Retailer-vs-Competition analysisI have taken an example from a previous Market Research experience to illustrate this.

Digging in the data is cumbersome. No question. You seek for hours for a valid analysis angle before finding one. And then suddenly, something. A story to tell, a recommendation to share, a result of some previously taken action to validate. A gem in the dark.

The attached chart shows the example of a given retailer (A) compared to its competitors in a specific catchment area. Its underperformance being so blatant, it was heading unfortunately to a shutdown; however, we found additional data, and suggested an assortment reshuffle and a store redesign which finally helped retailer (A) to catch up with town average in less than 18 months.

This example shows how one may drill data down to the lowest level (shop/town) from a full retailer panel, so as to find meaningful insights

2. Stone Cutting

Stone Cutting has more to do with on-going analytics, especially those taken from existing software, be they for digital analytics, data mining, or even semantic search. In this respect, one already has some raw material in hands, but its cutting is depending on current conditions and client wishes… The analytic work, in that respect, is to find out how to carve the stone and give it the best shape to maximize its value. This refinery work is what I name Stone Cutting.

Click-Rate analysisI have chosen an example from the web analytics world to illustrate this.

When optimizing an e-commerce website, one very quickly knows the type of action that triggers improved conversion; the analytics will then “only” provide some marginal information, e.g. what this specific campaign has brought to the company’s business, its ROI, its new visitors and buyers. Very important for the business, for sure. Vital even.

The attached example shows for instance that the efficiency of the banner-ad impression (click-rate per impression at more or less 5 per thousand) is stable up to 5 impressions, beyond this point, additional impressions are less efficient.

Information straight to the point with results also immediately actionable.

So two ways of analyzing data, one requiring heavy work and a lot of patience, the other one rather using brains and existing patterns, but both are necessary for efficient analytics, from breakthrough discoveries to fine-tuned reporting. Diggers in the South-African mines and cutters in the Antwerp jewelry shops are two very different populations, but both of them are necessary to create a diamond ring. For data analytics alike, a global in-depth knowledge of the analytical process is required, so as to offer the best of consultancy. So, let me remind you , my dual experience  in marketing and operations is a real nugget that you can get easily either on a part-time or full-time basis.

Chief Data Officer, the position you have to afford

Good Lord! Another Chief something Officer… Do you really need one? Yes you do, and you had better not wait too long before hiring one.

Chief Data Officer is a rather new function, at least with such a responsibility at that level of seniority.
So why do you need such a role? There are two main reasons for driving your data strategy at C-level: the variety of the data and its strategic monetization.

data_ubiquityFirst, data is ubiquity. Nowadays, except perhaps for a few survivalists, everyone is creating and using data everywhere, every time and, most important, without any real limitation.

Data is so abundant, that no one can grasp it globally at a single glance any more; a minimum requirement is to handle it with the proper governance, an optimum organization requires a global strategy. Considering the amount of sources (sales and client support, accounting and finance, HR, competitive intelligence, product database, industrial processes, PII, social networks, to name a few), no one in the company may own solely all this data.

A dedicated Data manager definitely is key.

data_valueSecond, data is value. All these sources, with numerous records, and an ever-growing amount of attributes, imply that investing money on the market means checking data before, either to verify assumptions or even more straightforward, to find an already existing answer that you should not be paying for.

In a fierce and global competitive world, data is an incredible asset for the company, maybe the most important one, for sure the less exploited; many actions could be improved by a sound data management, including shutting down data management silos, really sharing information across business units. As data cannot belong to one or the other stakeholder, its management ought to be lead at the highest level, at C-level.

A dedicated Data officer definitely is key.

Who is this Officer in charge of Data Management? “Meet the Chief Data Officer” wrote Brad Peters, earlier in 2014, on the site of The Economist.

hiring CDOFinally, affording a CDO position is a unique way to drive growth. For sure, you have to “make room” for this new Officer, taking away from the BU’s a part of what they believe is their power, data ownership. A significant move, that must be initiated by the CEO. And, of course, you have to hire the relevant person for the job, with enough experience to drive a global data strategy, but also with a good mix of technical knowledge and business acumen, so as to implement it. No spring chicken by any means!

I would be glad to answer any question about the CDO role, and help you define its job description. And maybe you will discover that I am a rather good candidate…

Understand data management challenges and face them

%d bloggers like this: