I want to start this piece by giving the most important
take-away for IT readers: They should
take care that data governance does not get in the way of Big Data, and not the reverse.
This may seem odd, when I among others have been pointing
out for some time that better data cleansing and the like are badly needed in
enterprise data strategies in general. But data governance is not just a
collection of techniques – it’s a whole philosophy of how to run your
data-related IT activities. Necessarily,
the IT department that focuses on data governance emphasizes risk – security
risk, risk of bad data, risk of letting parts of the business run amok in their
independence and create a complicated tangle of undocumented data
relationships. And that focus on risk
can very easily conflict with Big Data’s focus on reward – on proactive
identification of new data sources and digging deeper into the relationships
between the data sources one has, in order to gain competitive advantage.
While there is not necessarily clear evidence showing that
over-focus on data governance can impede Big Data strategies and thereby the
success of the organization, there is some suggestive data. Specifically, a
recent Sloan Management Review reported that the least successful organizations
were those that focused on using Big Data analytics to cut costs and optimize
business processes, while the most successful focused their Big Data analytics
on understanding their customers better and using that understanding to drive
new offerings. Data governance, as a
risk-focused philosophy, is also a cost-focused and internally-focused
strategy. The task of carefully defining
and controlling metadata seeks to cut the costs of duplicated effort and
unnecessary bug fixes inherent in line-of-business Wild-West data-store
proliferation. It therefore can constrain the kind of proliferation of usage of
new externally-generated data types like social-media data that yield the
greatest Big-Data success for the enterprise.
Who’s To Be Master?
So, if we need to take care that data governance does not
interfere with Big Data efforts, and yet things like data cleansing are clearly
valuable, how can we coordinate the two better?
I often find it useful in these situations to model the enterprise’s
data handling as a sausage factory, in which indescribable pieces of data
“meat” are ground together to produce informational “sausage”. I like to think of it as having five steps
(more or less):
Data entry – in which the main aim is data accuracy
Data consolidation – in which we strive for
consistency between the various pieces of data (accuracy plus consistency, in
my definition, equals data quality)
Data aggregation – in which we seek to widen the
scope of users who can see the data
Information targeting – in which we seek to make
the data into information fitted to particular targeted users
Information delivery – in which we seek to get
the information to where it is needed in a timely fashion
Information analysis – in which we try to
present the information to the user in a format that allows maximum in-depth
analytics.
Note that data governance as presently defined appears to
affect only the first two steps of this process. And yet, my previous studies
of the sausage factory suggest that all of the steps should be targeted, as
improving only the first two will only offer minor improvements in a process
which tends to “lose” ¾ of the valuable information along the way, each step
losing quite a bit more.
How does this apply to Big Data? The most successful users of Big Data, as
noted above, actively seek out external data that is dirty and unconsolidated
and yet is often more valuable than the organization’s “tamed” data. Data governance, as the effective front end
of the sausage factory, must therefore not exclude this Big Data in the name of
data quality – it must find ways of making it “good enough” that it can be fed
into the following four steps. Or, as
one particular database administrator told me, “dirty” data should not just be
discarded, as it can tell us about what our sausage factory is excluding that
we need to know.
Data governance should also not, if at all possible,
interfere with the four steps following data quality assurance. Widening scope widens security risks; but the
benefits outweigh the risks. Information delivery that involves a new data type
risks creating a “zone of ignorance” where database governors don’t know what
their analysts are doing; but the answer is not
to exclude the data type until that distant date when it can be properly
vetted.
Much of this can be done by using a data discovery or data
virtualization tool to discovery new data types and incorporate them in an
enterprise metadata store semi-automatically.
But that is not enough; IT needs to ensure that data governance accepts
that Big Data exclusion is not an option and that the aim is not pure data, but
rather the best balance of valuable Big Data and data quality.
In one of the Alice in Wonderland books, a character uses
the word “glory” in a very odd way, and Alice objects that he should not be
allowed to. “The question is,” the
character replies, “Who’s to be master, you or the word?” In a similar way, users of data governance
and Big Data need to understand that you with your need for big Data customer
insights from the outside world need to be master, not the data governance
enforcer.
1 comment:
Data Governance - Effective enterprise data governance practices that are aligned with internal business cases and built upon Adaptive’s Metadata Manager deliver key benefits
Post a Comment