The sad fact is that, imho, neither vendors nor users are
really supporting building a real-world enterprise information architecture –
and yet, the crying need for such an architecture and such support was apparent
to me eight years ago. The occasion for
such musings is a Composite Software/Cisco briefing I am attending today, in
which users are recognizing as never before the need and prerequisites for an
enterprise information architecture, and Composite Software is taking a
significant step forward in handling those needs. And yet, this news fills me with frustration
rather than anticipation.
This one requires, unfortunately, a fair bit of explanation
that I wish was not still necessary.
Let’s start by saying what I mean by an enterprise information
architecture, and what it requires.
The Enterprise Information – Not Data – Architecture
What theory says is that an enterprise information
architecture gets its hands around what data and types of data exists all over
the organization (and often needed data outside the organization) and also what
that data means to the organization – what information
the data conveys. Moreover, that
“meta-information” can’t just be a one-shot, else what is an enterprise
information architecture today quickly turns back into an enterprise data
architecture tomorrow. No, the
enterprise information architecture has to constantly evolve in order to stay
an enterprise information architecture.
So theory says an enterprise
information architecture has to have a global semantics-rich metadata
repository and the mechanisms in place to change it constantly, semi-automatically,
as new data and data types arrive.
Now the real world intrudes, as it has over the past 15
years in just about every major organization I know of. To the extent that users felt the need for an
enterprise information architecture, they adopted one of two tactics:
- Copy everything into one gigantic data warehouse, and put the repository on top of that (with variants of this tactic having to do with proliferating data marts coordinating with the central data warehouse), or
- “Muddle through” by responding reactively to every new data need with just enough to satisfy end users, and then trying to do a little linking of existing systems via metadata ad-hoc or on a per-project basis.
As early as 10 years ago, it was apparent to me that (1) was
failing. I could see existing systems in
which the more the data warehouse types tried to stuff everything into the
global data-warehouse data store, the further behind the proliferation of data
stores in the lines of business and regional centers (not to mention data on
the Internet) they fell. That trend has
continued up to now, and was testified to, amply, by two presenters at major
financial firms at today’s briefing, with attendees’ questions further
confirming this. Likewise, I saw (2)
among initial users of data virtualization software 8-5 years ago, and today I
overheard a conversation in which two IT types were sharing the news that there
were lots of copies of the same data out there and they needed to get a handle
on it, as if this was some startling revelation.
The long-term answer to this – the thing that makes an
enterprise data architecture an enterprise information architecture, and keeps
it that way – is acceptance that some data should be moved and/or copied to the
right, more central physical location, and some data should be accessed where
it presently resides. The costs of not
doing this, I should note, are not just massive confusion on the part of IT and
end users leading to massive added operational costs and inability to determine
just where the data is, much less what information it represents; these costs
are also, in a related way, performance and scalability costs – you can’t scale
in response to Big Data demands, or it costs far more.
The answer to this is as clear as it was 8 years ago: an architecture that semi-automatically,
dynamically, determines to correct location of data to optimize performance on
an ongoing basis. An enterprise
information architecture must have the ability to constantly optimize and
re-optimize the physical location of the data and the number of copies of each
datum.
The Sad State of the Art in Enterprise Information Architectures
Today’s briefing is reminding me, if I needed reminding,
that the tools for such a global meta-information architecture are pretty well
advanced, and that users are beginning to recognize the need to create such a
repository and to create it. There was
even the recognition of the Web equivalent of the repository problem, as
Composite tackles the fact that users are getting their “cloud information”
from multiple providers, and this information must be coordinated via metadata
between cloud providers and with internal enterprise information. All very
nice.
And yet, even in this, a conference of the “enlightened” as
to the virtues of a cross-database architecture, there was very little
recognition of what seemed to me to scream from the presentations and
conversations: there is a crying need
for dynamic optimization of the location of data. Those who think that the cloud proves that simply
putting a transparent veneer over physically farflung data archipelagoes solves
the problem should be aware that since the advent of public clouds,
infrastructure folks have been frantically putting in kludges to cope with the
fact that petabyte databases with terabyte-per-minute additions simply can’t be
copied from Beijing to Boston in real time to satisfy an American query.
And if the Composite attendees don’t see this, afaik, just
about every other vendor I know about, from IBM to Oracle to Microsoft to HP to
SAP to yada, sees even less and is doing even less. I know, from conversations with them, that
many of them are intellectually aware that this would be a very good thing to
implement; but the users don’t push them, and they don’t ask the users, and so
it never seems to be top of mind.
An Action Item – If You Can Do It
I am echoing one of the American Founding Fathers, who, when
asked what they were crafting, replied:
“A republic – if you can keep it.”
An enterprise information architecture is not only very valuable, now as
then, but also very doable – if vendors have the will to support it, and users
have the will to implement it with the additional support.
For vendors, that means simply creating the administrative
software to track data location, determine optimal data location and number of
copies, and change locations to move towards optimal allocation, over and over
– because optimal allocation is a constantly changing target, with obvious
long-term trends. For users, that means
using this support to the hilt, in concert with the global metadata repository,
and translating the major benefits accruing from more optimal data allocation
to terms the CEO can understand.
For now, we can measure those benefits by just how bad
things are right now. One telling
factoid at today’s conference: in the
typical query in Composite’s highly location-optimized software, 90% of the
performance hit was in passing data/results over the network. Yes, optimizing the network as Cisco has
suggested will help; but, fundamentally, that’s a bit like saying your football
team has to block and tackle better, while requiring that they always start a
play in the same positions on the field.
You tell me what doubled to 10 times the response time, endless queries
from hell, massive administrative time to retrofit data to get it physically
close to the user, and the like are costing you.
I would hope that, now that people are finally actually
recognizing location problems, that we can start beginning to implement real enterprise
information architectures. At the least,
your action item, vendor or user, should be to start considering it in earnest.
No comments:
Post a Comment