Wednesday, October 9, 2013

It’s Time to Finally Begin to Create An Enterprise Information Architecture

The sad fact is that, imho, neither vendors nor users are really supporting building a real-world enterprise information architecture – and yet, the crying need for such an architecture and such support was apparent to me eight years ago.   The occasion for such musings is a Composite Software/Cisco briefing I am attending today, in which users are recognizing as never before the need and prerequisites for an enterprise information architecture, and Composite Software is taking a significant step forward in handling those needs.  And yet, this news fills me with frustration rather than anticipation.

This one requires, unfortunately, a fair bit of explanation that I wish was not still necessary.  Let’s start by saying what I mean by an enterprise information architecture, and what it requires.

The Enterprise Information – Not Data – Architecture 

What theory says is that an enterprise information architecture gets its hands around what data and types of data exists all over the organization (and often needed data outside the organization) and also what that data means to the organization – what information the data conveys.  Moreover, that “meta-information” can’t just be a one-shot, else what is an enterprise information architecture today quickly turns back into an enterprise data architecture tomorrow.  No, the enterprise information architecture has to constantly evolve in order to stay an enterprise information architecture.  So theory says an enterprise information architecture has to have a global semantics-rich metadata repository and the mechanisms in place to change it constantly, semi-automatically, as new data and data types arrive.

Now the real world intrudes, as it has over the past 15 years in just about every major organization I know of.  To the extent that users felt the need for an enterprise information architecture, they adopted one of two tactics:
  1.  Copy everything into one gigantic data warehouse, and put the repository on top of that (with variants of this tactic having to do with proliferating data marts coordinating with the central data warehouse), or
  2. “Muddle through” by responding reactively to every new data need with just enough to satisfy end users, and then trying to do a little linking of existing systems via metadata ad-hoc or on a per-project basis.

As early as 10 years ago, it was apparent to me that (1) was failing.  I could see existing systems in which the more the data warehouse types tried to stuff everything into the global data-warehouse data store, the further behind the proliferation of data stores in the lines of business and regional centers (not to mention data on the Internet) they fell.  That trend has continued up to now, and was testified to, amply, by two presenters at major financial firms at today’s briefing, with attendees’ questions further confirming this.  Likewise, I saw (2) among initial users of data virtualization software 8-5 years ago, and today I overheard a conversation in which two IT types were sharing the news that there were lots of copies of the same data out there and they needed to get a handle on it, as if this was some startling revelation.

The long-term answer to this – the thing that makes an enterprise data architecture an enterprise information architecture, and keeps it that way – is acceptance that some data should be moved and/or copied to the right, more central physical location, and some data should be accessed where it presently resides.  The costs of not doing this, I should note, are not just massive confusion on the part of IT and end users leading to massive added operational costs and inability to determine just where the data is, much less what information it represents; these costs are also, in a related way, performance and scalability costs – you can’t scale in response to Big Data demands, or it costs far more.

The answer to this is as clear as it was 8 years ago:  an architecture that semi-automatically, dynamically, determines to correct location of data to optimize performance on an ongoing basis. An enterprise information architecture must have the ability to constantly optimize and re-optimize the physical location of the data and the number of copies of each datum.

The Sad State of the Art in Enterprise Information Architectures

Today’s briefing is reminding me, if I needed reminding, that the tools for such a global meta-information architecture are pretty well advanced, and that users are beginning to recognize the need to create such a repository and to create it.  There was even the recognition of the Web equivalent of the repository problem, as Composite tackles the fact that users are getting their “cloud information” from multiple providers, and this information must be coordinated via metadata between cloud providers and with internal enterprise information. All very nice.

And yet, even in this, a conference of the “enlightened” as to the virtues of a cross-database architecture, there was very little recognition of what seemed to me to scream from the presentations and conversations:  there is a crying need for dynamic optimization of the location of data.  Those who think that the cloud proves that simply putting a transparent veneer over physically farflung data archipelagoes solves the problem should be aware that since the advent of public clouds, infrastructure folks have been frantically putting in kludges to cope with the fact that petabyte databases with terabyte-per-minute additions simply can’t be copied from Beijing to Boston in real time to satisfy an American query.

And if the Composite attendees don’t see this, afaik, just about every other vendor I know about, from IBM to Oracle to Microsoft to HP to SAP to yada, sees even less and is doing even less.  I know, from conversations with them, that many of them are intellectually aware that this would be a very good thing to implement; but the users don’t push them, and they don’t ask the users, and so it never seems to be top of mind.

An Action Item – If You Can Do It

I am echoing one of the American Founding Fathers, who, when asked what they were crafting, replied:  “A republic – if you can keep it.”  An enterprise information architecture is not only very valuable, now as then, but also very doable – if vendors have the will to support it, and users have the will to implement it with the additional support.

For vendors, that means simply creating the administrative software to track data location, determine optimal data location and number of copies, and change locations to move towards optimal allocation, over and over – because optimal allocation is a constantly changing target, with obvious long-term trends.  For users, that means using this support to the hilt, in concert with the global metadata repository, and translating the major benefits accruing from more optimal data allocation to terms the CEO can understand.

For now, we can measure those benefits by just how bad things are right now.  One telling factoid at today’s conference:  in the typical query in Composite’s highly location-optimized software, 90% of the performance hit was in passing data/results over the network.  Yes, optimizing the network as Cisco has suggested will help; but, fundamentally, that’s a bit like saying your football team has to block and tackle better, while requiring that they always start a play in the same positions on the field.  You tell me what doubled to 10 times the response time, endless queries from hell, massive administrative time to retrofit data to get it physically close to the user, and the like are costing you.

I would hope that, now that people are finally actually recognizing location problems, that we can start beginning to implement real enterprise information architectures.  At the least, your action item, vendor or user, should be to start considering it in earnest.

No comments: