Thursday, October 10, 2013

The Good News From Composite Software/Cisco: To ‘Global’, Faster Data Virtualization And Beyond

Like some Composite Software users represented at their annual “Data Virtualization Day” today, my concerns about the future of Composite as a Cisco acquisition had not been completely allayed before the conference – and yet, by the end, I can say that my original concerns have been replaced by a hope that Composite Software will deliver user benefits well beyond what I had anticipated from Composite Software going it alone, over the next few years. In fact – and this I really did not expect – I believe that some of these benefits will lie well outside of the traditional turf of data virtualization.
Of course, with hope comes new concerns.  Specifically, Composite Software’s roadmap now involves an ambitious expansion of their solutions, and therefore of product-development tasks.  With Composite Software’s track record and intellectual capital, I have little doubt that these tasks will be accomplished; with new folks to be brought on board, I am not sure how long full implementation will take.  And, as an analyst greedy on behalf of users, I would argue that an implementation of most of the goals set forth, within the next two years, would be far more valuable to IT.  But this is far more of a nit than questioning the future of data virtualization without the impetus of its typical technology leader.
My change of mind happened with a speech by Jim Green, long-time technology driver at Composite and now General Manager of his own Business Unit within Cisco.  It was, imho, the best speech, for breadth and accuracy of vision, I have heard him give.  Enough of the lead-in; let’s go on to my analysis of what I think it all means.

Business As Unusual Plus Three New Directions

When I say “business as unusual” I mean that many of the upcoming products and aims that Jim or others have mentioned fall firmly in the category of extensions of already evident technology improvements – e.g., continued performance fine-tuning, and support for more Web use cases such as those involving Hadoop.  I don’t want to call this “business as usual”, because I don’t see too many other infrastructure-software companies out there that continue to anticipate as well as reactively fulfil the expressed needs of users dealing with Web Big Data.  Hence, what seems usual Composite-Software practice strikes me as unusual for many other companies. And so, when Jim Green talks about extending data-virtualization support from the cloud to “global” situations, I see business as unusual.
Beyond this, I hear three major new directions:
  1. Software/app-driven transactional network optimization;
  2. The “virtual data sandbox”; and
  3. “composite clouds”.
Let’s take each in turn.

Software/app-driven transactional network optimization

It has been obvious that a driver of the acquisition was the hope on the part of both Composite Software and Cisco that Composite could use Cisco’s network dominance to do Good Stuff.  The questions were, specifically what Good Stuff, and how can it be implemented effectively without breaking Composite’s “we handle any data from anyone in an open way” model.
Here’s the way I read Jim Green’s answer to What Good Stuff?  As he pointed out, the typical Composite cross-database query takes up 90% of its time passing data back and forth over the network – and we should note that Composite has done quite a bit of performance optimization over the years via “driving querying to the best vendor database instance” and thus minimizing data transmission.  The answer, he suggested, was to surface the network’s decisions on data routing and prioritization, and allow software to drive those scheduling decisions – specifically, software that is deciding routing/prioritization based on transactional optimization, not on a snapshot of an array of heterogeneous packet transmission demands. To put it another way, your app uses software to demand results of a query, Composite software tells the network the prioritization of the transmissions involved in the resulting transactions from you and other users, and Cisco aids the Composite software in this optimization by telling it what the state of the network is and what the pros and cons of various routes are.
The answer to avoiding breaking Composite’s open stance is, apparently, to use Cisco’s open network software and protocols.  As for implementation, it appears that Cisco surfacing the network data via its router and other network software (as other networking vendors can do as well), plus Composite embedding both transactional network optimization and support for app-developer network optimization in its developer-facing software, is a straightforward way to do the job. 
What is relatively straightforward in implementation should not obscure a fundamentally fairly novel approach to network optimization.  As in the storage area, it used to be the job of the bottom-of-the-stack distributed devices to optimize network performance.  If we now give the top-of-the-stack applications the power to determine priorities, we are (a) drawing a much more direct line between corporate user needs and network operation, and (b) squarely facing the need to load-balance network usage between competing applications. It’s not just a data-virtualization optimization; it’s a change (and a very beneficial one) in overall administrative mindset and network architecture, useful well beyond the traditional sphere of data virtualization software.

The”Virtual Data Sandbox”

Jim described a Collage product that allowed self-service BI users to create their own spaces in which to carry out queries, and administrators to support them.  More broadly, the idea is to isolate the data with which the ad-hoc BI end user is playing, where appropriate, by copying it elsewhere, while still allowing self-service-user queries on operational databases and data warehouses where it is not too impactful.  More broadly, the idea is to semi-automatically set up a “virtual data sandbox” in which the data analyst can play, allowing IT to focus on being “data curators” or managers rather than putting out unexpected self-service-user “query from hell” fires all the time.
My comment from the peanut gallery is that this, like the software-driven transactional optimization described in the previous section, will take Composite well beyond its traditional data-virtualization turf, and that will turn out to be good for both Composite/Cisco and the end user.  Necessarily, evolving Collage will mean supporting more ad-hoc, more exploratory BI – a business-user app rather than an IT infrastructure solution.  This should mean such features as the “virtual metadata sandbox”, in which the analyst not only searches for answers to initial questions but then explores what new data types might be available for further exploration – without the need for administrator hand-holding, and allowing administrators to do role-based view limitation semi-automatically.  Meanwhile, Composite and Cisco will be talking more directly with the ultimate end user of their software and hardware, rather than an endless series of IT and business mediators.

The “Composite Cloud”

Finally, Jim briefly alluded to software to provide a single data-virtualization view and database veneer for heterogeneous data (e.g., social-media data and Hadoop file systems) from multiple cloud providers – the so-called “composite cloud.”  This is a more straightforward extension of data virtualization – but it’s a need that I have been talking about and users have been recognizing for a couple of years at least, and I don’t hear most if not all other Big Data vendors talking about it. 
It is also a welcome break in the hype about cloud technology.  No, cloud technology does not make everything into one fuzzy “ball” in which anything physical is transparent to the user, administrator, and developer.  Location still matters a lot, and so does which public cloud or public clouds you get your data from.  Thus, creation of a “composite cloud” to deal with multiple-cloud data access represents an important step forward in real-world use of the cloud.

Interlude:  The Users Evolve

I should also note striking differences in user reports of usage of data virtualization software, compared with the last few years I’ve attended Data Virtualization Day and spoken with them.  For one thing, users were talking about implementing global metadata repositories or “logical data models” filled with semantic information on top of Composite, and it was quite clearly a major strategic direction for the firms – e.g., Goldman Sachs and Sky, among the largest of financial-service and TV/entertainment companies.  Moreover, the questions from the audience centered on “how to”, indicating corresponding strategic efforts or plans among plenty of other companies.  What I among others envisioned as a strategic global metadata repository based on data-virtualization software more than a decade ago has now arrived.
Moreover, the discussion showed that users now “get it” in implementation of such repositories.  There is always a tradeoff between defining corporate metadata and hence constraining users’ ability to use new data sources within the organization, and a Wild West in which no one but you realizes that there’s this valuable information in the organization, and IT is expected to pick up after you when you misuse it.  Users are now aware of the need to balance the two, and it is not deterring them in the slightest from seeing and seizing the benefits of the global metadata repository.  In effect, global metadata repositories are now pretty much mature technology.
The other striking difference was the degree to which users were taking up the idea of routing all their data-accessing applications through a data virtualization layer.  The benefits of this are so great in terms of allowing data movement and redefinition without needing to rewrite hundreds of ill-documented applications (and, of course, loss of performance due to the added layer continues to be minimal or an actual performance gain in some cases), as I also wrote a decade ago, that it still surprises me that it took this long for users to “get it”; but get it they apparently have.  And so, now, users see the benefits of data virtualization not only for the end user (originally) and the administrator (more recently), but the developer as well.

Conclusion:  The IT Bottom Line

It remains true that good data virtualization solutions are thin on the ground, hence my original worry about the Cisco acquisition.  The message of Data Virtualization Day to customers and prospects should be that not only Composite Software’s solutions, but also data virtualization solutions in general, are set for the near and medium-term future on their present course.   Moreover, not only are the potential benefits as great as they ever were, but now, in just about every area, there is mature, user-tested technology to back up that potential.
So now we can move on to the next concern, about new potential benefits.  How important are software/app-driven transactional network optimization, the “virtual data sandbox”, and “composite clouds”, and how “real” is the prospect of near-term or medium-term benefits from these, from Composite Software or anyone else?  My answer to each of these questions, respectively, is “the first two are likely to be very important in the medium term, the third in the short term”, and “Composite Software should deliver; the only question is how long it takes them to get there.” 
My action items, therefore, for IT, are to check out Composite Software if you haven’t done so, to continue to ramp up the strategic nature of your implementations if you have, and to start planning for the new directions and new benefits.  Above all, bear in mind that these benefits lie not just in traditional data virtualization software uses – but in areas of IT well beyond these.

Wednesday, October 9, 2013

It’s Time to Finally Begin to Create An Enterprise Information Architecture

The sad fact is that, imho, neither vendors nor users are really supporting building a real-world enterprise information architecture – and yet, the crying need for such an architecture and such support was apparent to me eight years ago.   The occasion for such musings is a Composite Software/Cisco briefing I am attending today, in which users are recognizing as never before the need and prerequisites for an enterprise information architecture, and Composite Software is taking a significant step forward in handling those needs.  And yet, this news fills me with frustration rather than anticipation.

This one requires, unfortunately, a fair bit of explanation that I wish was not still necessary.  Let’s start by saying what I mean by an enterprise information architecture, and what it requires.

The Enterprise Information – Not Data – Architecture 

What theory says is that an enterprise information architecture gets its hands around what data and types of data exists all over the organization (and often needed data outside the organization) and also what that data means to the organization – what information the data conveys.  Moreover, that “meta-information” can’t just be a one-shot, else what is an enterprise information architecture today quickly turns back into an enterprise data architecture tomorrow.  No, the enterprise information architecture has to constantly evolve in order to stay an enterprise information architecture.  So theory says an enterprise information architecture has to have a global semantics-rich metadata repository and the mechanisms in place to change it constantly, semi-automatically, as new data and data types arrive.

Now the real world intrudes, as it has over the past 15 years in just about every major organization I know of.  To the extent that users felt the need for an enterprise information architecture, they adopted one of two tactics:
  1.  Copy everything into one gigantic data warehouse, and put the repository on top of that (with variants of this tactic having to do with proliferating data marts coordinating with the central data warehouse), or
  2. “Muddle through” by responding reactively to every new data need with just enough to satisfy end users, and then trying to do a little linking of existing systems via metadata ad-hoc or on a per-project basis.

As early as 10 years ago, it was apparent to me that (1) was failing.  I could see existing systems in which the more the data warehouse types tried to stuff everything into the global data-warehouse data store, the further behind the proliferation of data stores in the lines of business and regional centers (not to mention data on the Internet) they fell.  That trend has continued up to now, and was testified to, amply, by two presenters at major financial firms at today’s briefing, with attendees’ questions further confirming this.  Likewise, I saw (2) among initial users of data virtualization software 8-5 years ago, and today I overheard a conversation in which two IT types were sharing the news that there were lots of copies of the same data out there and they needed to get a handle on it, as if this was some startling revelation.

The long-term answer to this – the thing that makes an enterprise data architecture an enterprise information architecture, and keeps it that way – is acceptance that some data should be moved and/or copied to the right, more central physical location, and some data should be accessed where it presently resides.  The costs of not doing this, I should note, are not just massive confusion on the part of IT and end users leading to massive added operational costs and inability to determine just where the data is, much less what information it represents; these costs are also, in a related way, performance and scalability costs – you can’t scale in response to Big Data demands, or it costs far more.

The answer to this is as clear as it was 8 years ago:  an architecture that semi-automatically, dynamically, determines to correct location of data to optimize performance on an ongoing basis. An enterprise information architecture must have the ability to constantly optimize and re-optimize the physical location of the data and the number of copies of each datum.

The Sad State of the Art in Enterprise Information Architectures

Today’s briefing is reminding me, if I needed reminding, that the tools for such a global meta-information architecture are pretty well advanced, and that users are beginning to recognize the need to create such a repository and to create it.  There was even the recognition of the Web equivalent of the repository problem, as Composite tackles the fact that users are getting their “cloud information” from multiple providers, and this information must be coordinated via metadata between cloud providers and with internal enterprise information. All very nice.

And yet, even in this, a conference of the “enlightened” as to the virtues of a cross-database architecture, there was very little recognition of what seemed to me to scream from the presentations and conversations:  there is a crying need for dynamic optimization of the location of data.  Those who think that the cloud proves that simply putting a transparent veneer over physically farflung data archipelagoes solves the problem should be aware that since the advent of public clouds, infrastructure folks have been frantically putting in kludges to cope with the fact that petabyte databases with terabyte-per-minute additions simply can’t be copied from Beijing to Boston in real time to satisfy an American query.

And if the Composite attendees don’t see this, afaik, just about every other vendor I know about, from IBM to Oracle to Microsoft to HP to SAP to yada, sees even less and is doing even less.  I know, from conversations with them, that many of them are intellectually aware that this would be a very good thing to implement; but the users don’t push them, and they don’t ask the users, and so it never seems to be top of mind.

An Action Item – If You Can Do It

I am echoing one of the American Founding Fathers, who, when asked what they were crafting, replied:  “A republic – if you can keep it.”  An enterprise information architecture is not only very valuable, now as then, but also very doable – if vendors have the will to support it, and users have the will to implement it with the additional support.

For vendors, that means simply creating the administrative software to track data location, determine optimal data location and number of copies, and change locations to move towards optimal allocation, over and over – because optimal allocation is a constantly changing target, with obvious long-term trends.  For users, that means using this support to the hilt, in concert with the global metadata repository, and translating the major benefits accruing from more optimal data allocation to terms the CEO can understand.

For now, we can measure those benefits by just how bad things are right now.  One telling factoid at today’s conference:  in the typical query in Composite’s highly location-optimized software, 90% of the performance hit was in passing data/results over the network.  Yes, optimizing the network as Cisco has suggested will help; but, fundamentally, that’s a bit like saying your football team has to block and tackle better, while requiring that they always start a play in the same positions on the field.  You tell me what doubled to 10 times the response time, endless queries from hell, massive administrative time to retrofit data to get it physically close to the user, and the like are costing you.

I would hope that, now that people are finally actually recognizing location problems, that we can start beginning to implement real enterprise information architectures.  At the least, your action item, vendor or user, should be to start considering it in earnest.

Wednesday, October 2, 2013

Composite Software, Cisco, and the Potential of Web Data in Motion

The long-term customer benefits of the acquisition of Composite Software, one of the pre-eminent data virtualization vendors, by Cisco, long known primarily for its communications prowess, aren’t obvious at first sight – but I believe that in one area, there is indeed major potential for highly useful new technology. Specifically, I believe that Cisco is well positioned to use Composite Software to handle event-driven processing of “data in motion” over the Web.

Why should this matter to the average IT person? Let’s start with the fact that enormous amounts of data (Big Data, especially social-media data) passes between smartphone/tablet/computer and computer on a minute-by-minute and second-by-second basis on the Internet – effectively, outside of corporate boundaries and firewalls. This data is typically user data; unlike much of corporate data, it is semi-structured (text) or unstructured (graphics, audio, video, pictures) or “mixed”. In fact, the key to this data is that it is not only unusually large-chunk but also unusually variant in type: what passes over the Internet at any one time is not only a mix of images and text, but also a mix that also changes from second to second.

Up to now, customers have been content with an arrangement in which much of the data eventually winds up in huge repositories in large server farms at public cloud provider facilities. In turn, enterprises dip into these repositories via Hadoop or mass downloads. The inevitable delays in data access inherent in such arrangements are seen as much less important than the improvements in social-data and Big-Data access that such an architecture provides.

Now suppose that we could add an “event processor” to “strain”, redirect, and preliminarily interpret this data well before it arrives at a repository, much less before the remote, over-stressed repository finally delivers the data to the enterprise. It would not replace the public cloud repository; but it would provide a clear alternative for a wide swath of cases with far superior information delivery speed.
This would be especially valuable for what I have called “sensor” data. This is the mass of smartphone pictures and video that reports a news event, or the satellite and GPS data that captures the locations and movement of people and packages in real time. From this, the event processor could distil and deliver alerts of risks and buying-pattern changes, key changes on a daily or hourly basis of the rhythms of daily commerce and customer preferences beyond those typically viewed by the enterprise itself, and opportunities available to fast responders.
Does such an event processor exist now? No, and that’s the point. To fulfill its full potential, that event processor would need to be (1) just about ubiquitous, (2) highly performant, and (3) able to analyze disparate data effectively. No event processor out there truly meets any but the second of these requirements.

"It … Could … Be … Done!”

Those old enough to remember recognize these words from Mel Brooks’ movie Young Frankenstein, when the hero is shocked to recognize that his father’s work was not, in fact, as he had put it, “complete doo-doo.” My point in echoing them here is to say that, in fact, the combination of Cisco and Composite Software is surprisingly close to fulfilling all the of the requirements cited above.

Let’s start with “just about ubiquitous.” As regards “data in motion”, Cisco with its routers fills the bill as well as anyone. Of course, event processors on each router would need to be coordinated (that is, one would prefer not to send spurious alerts when data flowing over an alternate route and reunited at the destination might cause us to say “oops, never mind”). However, both Cisco and Composite Software have a great deal of experience in handling in a coordinated fashion, in parallel, multiple streams of data. We do not have to achieve data integrity across millions of routers, merely local coordination centers that adequately combine the data into a composite picture (pardon the pun) – which Composite Software is well experienced in doing.

How about “able to analyze disparate data fast”? Here is where Composite Software really shines, with its multi-decade fine-tuning of cross-data-type distributed data analysis. Better than most if not all conventional databases, Composite Server provides a “database veneer” that offers transparent performance optimization of distributed data access over all the data types prevalent both in the enterprise and on the Internet.

It is indeed, the “highly performant” criterion where Composite Software plus Cisco is most questionable right now. Neither Composite Server nor Cisco’s pre-existing software was designed to handle event processing as we know it today. However, it could be said that today’s event processors conceptually could be split into two parts: (a) a pre-processor that makes initial decisions that don’t require much cross-data analysis, and (b) a conventional database that uses a “cache” data store (still in almost real time) for deeper analysis before the final action is taken. Composite Server probably can handle (b) with some cross-router or cross-machine processing thrown in, while a conventional event processor could be inserted to handle (a).

The IT Bottom Line: Making Do Until Nirvana

Is there nothing that can be done, then, except wait and hope that Composite Software and Cisco recognize the opportunity and fill in the pieces, or some other vendor spends the time to reproduce what they already have? Actually, I think there may be. It’s not the long-term solution; but it mimics to some extent a ubiquitous Web event processor.

I am talking about setting up Composite Software as a front end rather than a back end to public cloud provider databases. A simple multiplexer could “strain” and feed data to multiple data stores using multiple conventional operational databases for the constant stream of updates, as well as to backend Hadoop/MapReduce file systems and traditional databases. Composite Server would then carry out queries across these “data type specialists”, in much the same way it operates now. The main difference between this approach and what is happening now is that Composite Software will get a much smaller subset of provider data at the same time as the file system – and hence will at least deliver alerts on some key “sensor” data well ahead of the stressed-out Big-Data Hadoop data store.

My suggested action item for IT, therefore, is to start conceptualizing such a means of handling Web “data in motion,” and possibly to set up a Composite-Server testbed, to lead on to implementation of an interim solution. I would also appreciate it if IT would gently indicate to Cisco that they would find a full-fledged solution highly desirable. A Web “data in motion” event processor would be a Big new extension of Big Data, with Big benefits, and it seems to me that Composite Software and Cisco are best positioned to make such a solution available sooner rather than later.

It … could … be … done! Let’s … do … it … now!