Like some Composite Software users represented at their
annual “Data Virtualization Day” today, my concerns about the future of
Composite as a Cisco acquisition had not been completely allayed before the
conference – and yet, by the end, I can say that my original concerns have been
replaced by a hope that Composite Software will deliver user benefits well
beyond what I had anticipated from Composite Software going it alone, over the
next few years. In fact – and this I really
did not expect – I believe that some of these benefits will lie well outside of
the traditional turf of data virtualization.
Of course, with hope comes new concerns. Specifically, Composite Software’s roadmap
now involves an ambitious expansion of their solutions, and therefore of product-development
tasks. With Composite Software’s track
record and intellectual capital, I have little doubt that these tasks will be
accomplished; with new folks to be brought on board, I am not sure how long
full implementation will take. And, as
an analyst greedy on behalf of users, I would argue that an implementation of
most of the goals set forth, within the next two years, would be far more
valuable to IT. But this is far more of
a nit than questioning the future of data virtualization without the impetus of
its typical technology leader.
My change of mind happened with a speech by Jim Green,
long-time technology driver at Composite and now General Manager of his own
Business Unit within Cisco. It was,
imho, the best speech, for breadth and accuracy of vision, I have heard him
give. Enough of the lead-in; let’s go on
to my analysis of what I think it all means.
Business As Unusual Plus Three New Directions
When I say “business as unusual” I mean that many of the
upcoming products and aims that Jim or others have mentioned fall firmly in the
category of extensions of already evident technology improvements – e.g.,
continued performance fine-tuning, and support for more Web use cases such as
those involving Hadoop. I don’t want to
call this “business as usual”, because I don’t see too many other
infrastructure-software companies out there that continue to anticipate as well
as reactively fulfil the expressed needs of users dealing with Web Big
Data. Hence, what seems usual
Composite-Software practice strikes me as unusual for many other companies. And
so, when Jim Green talks about extending data-virtualization support from the
cloud to “global” situations, I see business as unusual.
Beyond this, I hear three major new directions:
- Software/app-driven transactional network optimization;
- The “virtual data sandbox”; and
- “composite clouds”.
Let’s take each in turn.
Software/app-driven transactional network optimization
It has been obvious that a driver of the acquisition was the
hope on the part of both Composite Software and Cisco that Composite could use
Cisco’s network dominance to do Good Stuff.
The questions were, specifically what Good Stuff, and how can it be
implemented effectively without breaking Composite’s “we handle any data from
anyone in an open way” model.
Here’s the way I read Jim Green’s answer to What Good
Stuff? As he pointed out, the typical
Composite cross-database query takes up 90% of its time passing data back and
forth over the network – and we should note that Composite has done quite a bit
of performance optimization over the years via “driving querying to the best
vendor database instance” and thus minimizing data transmission. The answer, he suggested, was to surface the
network’s decisions on data routing and prioritization, and allow software to
drive those scheduling decisions – specifically, software that is deciding
routing/prioritization based on transactional optimization, not on a snapshot
of an array of heterogeneous packet transmission demands. To put it another
way, your app uses software to demand results of a query, Composite software
tells the network the prioritization of the transmissions involved in the
resulting transactions from you and other users, and Cisco aids the Composite
software in this optimization by telling it what the state of the network is
and what the pros and cons of various routes are.
The answer to avoiding breaking Composite’s open stance is,
apparently, to use Cisco’s open network software and protocols. As for implementation, it appears that Cisco
surfacing the network data via its router and other network software (as other
networking vendors can do as well), plus Composite embedding both transactional
network optimization and support for app-developer network optimization in its
developer-facing software, is a straightforward way to do the job.
What is relatively straightforward in implementation should
not obscure a fundamentally fairly novel approach to network optimization. As in the storage area, it used to be the job
of the bottom-of-the-stack distributed devices to optimize network
performance. If we now give the
top-of-the-stack applications the power to determine priorities, we are (a)
drawing a much more direct line between corporate user needs and network
operation, and (b) squarely facing the need to load-balance network usage
between competing applications. It’s not just a data-virtualization
optimization; it’s a change (and a very beneficial one) in overall
administrative mindset and network architecture, useful well beyond the
traditional sphere of data virtualization software.
The”Virtual Data Sandbox”
Jim described a Collage product that allowed self-service BI
users to create their own spaces in which to carry out queries, and
administrators to support them. More
broadly, the idea is to isolate the data with which the ad-hoc BI end user is
playing, where appropriate, by copying it elsewhere, while still allowing self-service-user
queries on operational databases and data warehouses where it is not too
impactful. More broadly, the idea is to
semi-automatically set up a “virtual data sandbox” in which the data analyst
can play, allowing IT to focus on being “data curators” or managers rather than
putting out unexpected self-service-user “query from hell” fires all the time.
My comment from the peanut gallery is that this, like the
software-driven transactional optimization described in the previous section,
will take Composite well beyond its traditional data-virtualization turf, and
that will turn out to be good for both Composite/Cisco and the end user. Necessarily, evolving Collage will mean
supporting more ad-hoc, more exploratory BI – a business-user app rather than
an IT infrastructure solution. This
should mean such features as the “virtual metadata sandbox”, in which the
analyst not only searches for answers to initial questions but then explores
what new data types might be available for further exploration – without the
need for administrator hand-holding, and allowing administrators to do
role-based view limitation semi-automatically.
Meanwhile, Composite and Cisco will be talking more directly with the
ultimate end user of their software and hardware, rather than an endless series
of IT and business mediators.
The “Composite Cloud”
Finally, Jim briefly alluded to software to provide a single
data-virtualization view and database veneer for heterogeneous data (e.g.,
social-media data and Hadoop file systems) from multiple cloud providers – the
so-called “composite cloud.” This is a
more straightforward extension of data virtualization – but it’s a need that I
have been talking about and users have been recognizing for a couple of years
at least, and I don’t hear most if not all other Big Data vendors talking about
it.
It is also a welcome break in the hype about cloud
technology. No, cloud technology does
not make everything into one fuzzy “ball” in which anything physical is
transparent to the user, administrator, and developer. Location still matters a lot, and so does
which public cloud or public clouds you get your data from. Thus, creation of a “composite cloud” to deal
with multiple-cloud data access represents an important step forward in
real-world use of the cloud.
Interlude: The Users Evolve
I should also note striking differences in user reports of
usage of data virtualization software, compared with the last few years I’ve
attended Data Virtualization Day and spoken with them. For one thing, users were talking about
implementing global metadata repositories or “logical data models” filled with
semantic information on top of Composite, and it was quite clearly a major
strategic direction for the firms – e.g., Goldman Sachs and Sky, among the
largest of financial-service and TV/entertainment companies. Moreover, the questions from the audience
centered on “how to”, indicating corresponding strategic efforts or plans among
plenty of other companies. What I among
others envisioned as a strategic global metadata repository based on
data-virtualization software more than a decade ago has now arrived.
Moreover, the discussion showed that users now “get it” in
implementation of such repositories.
There is always a tradeoff between defining corporate metadata and hence
constraining users’ ability to use new data sources within the organization,
and a Wild West in which no one but you realizes that there’s this valuable
information in the organization, and IT is expected to pick up after you when
you misuse it. Users are now aware of
the need to balance the two, and it is not deterring them in the slightest from
seeing and seizing the benefits of the global metadata repository. In effect, global metadata repositories are
now pretty much mature technology.
The other striking difference was the degree to which users
were taking up the idea of routing all their data-accessing applications
through a data virtualization layer. The
benefits of this are so great in terms of allowing data movement and
redefinition without needing to rewrite hundreds of ill-documented applications
(and, of course, loss of performance due to the added layer continues to be
minimal or an actual performance gain in some cases), as I also wrote a decade
ago, that it still surprises me that it took this long for users to “get it”;
but get it they apparently have. And so,
now, users see the benefits of data virtualization not only for the end user
(originally) and the administrator (more recently), but the developer as well.
Conclusion: The IT Bottom Line
It remains true that good data virtualization solutions are
thin on the ground, hence my original worry about the Cisco acquisition. The message of Data Virtualization Day to
customers and prospects should be that not only Composite Software’s solutions,
but also data virtualization solutions in general, are set for the near and
medium-term future on their present course.
Moreover, not only are the potential benefits as great as they ever
were, but now, in just about every area, there is mature, user-tested
technology to back up that potential.
So now we can move on to the next concern, about new
potential benefits. How important are
software/app-driven transactional network optimization, the “virtual data
sandbox”, and “composite clouds”, and how “real” is the prospect of near-term
or medium-term benefits from these, from Composite Software or anyone
else? My answer to each of these
questions, respectively, is “the first two are likely to be very important in
the medium term, the third in the short term”, and “Composite Software should
deliver; the only question is how long it takes them to get there.”
My action items, therefore, for IT, are to check out
Composite Software if you haven’t done so, to continue to ramp up the strategic
nature of your implementations if you have, and to start planning for the new
directions and new benefits. Above all,
bear in mind that these benefits lie not just in traditional data
virtualization software uses – but in areas of IT well beyond these.