Tuesday, June 24, 2014

Data Virtualization: Sorry, Forrester, It Seems I Disagree

According to Barry Brunelli of Techtarget, a recent Forrester Research report places IBM and Informatica at the top of the heap, ahead of Composite Software and Denodo – and I disagree.  I do, in fact, have a lot more respect for the data management folks at Forrester than I do for their development folks, who produced a  report a couple of years back with a very poor (imho) understanding of the nature of agile development.  And I do believe that Forrester deserves credit, compared apparently to Gartner, for recognizing both the increasing importance and the ongoing potential for business benefits of data virtualization.  However, I would continue to put Composite Software (now under Cisco) and Denodo ahead of IBM and Informatica in functionality, fit to customer need, and ongoing value-add in the immediate future.  Why?

The Importance Of Paying One’s Dues In Data Virtualization

The understanding of Composite’s and Denodo’s advantages begins with the fact that since the beginning, it has often been confused with a technology originally called EAI, or Enterprise Application Integration.  Both integrate data; but they start from a foundation that aims data integration at very different purposes.  EAI originally aimed (and still does, in some cases), to pass data between two or more enterprise applications, such as SAP and Oracle Apps.  As a result, they created gateways that converted this (usually bulk) data to a common format, and then retranslated as necessary to pass to the target enterprise app.  As it turned out, this conversion to a common format is exactly what is needed to provide a front end to handle data streaming to a data warehouse – and thus, EAI and ETL (extract, transform, load) tools share a fair amount of functionality.  However, there is no sense of urgency about this conversion; it is for populating a database, not for immediately providing an answer to a query.

By contrast, data virtualization from the start aimed to provide querying (and, eventually, updates) across multiple databases and data management tools.  That, in turn, meant leaving most of the data on the device on which it already resided, and converting and combining only those parts of the data needed for a result – and so, high-performance querying became part of the package from the get-go.  Moreover, figuring out how to optimize queries in this way effectively takes quite a while, and new data types (e.g., social media, Hadoop) and data stores (e.g., data from multiple clouds) keep coming along and must be handled.

As I recall, Composite Software have been continually refining their software since at least 2003.  IBM originally had a matching product (now apparently part of InfoSphere).  However, in the mid-2000s, IBM chose to focus on the newly-acquired Ascential (more of an EAI-type product) instead, and only recently have they begun to re-focus on data virtualization technologies, with the acquisition of an unstructured-data virtualization company and with increased (and welcome!) attention paid, notably during the recent Information Management conference.  Based on my last conversations with IBM, I suspect that they have a fair amount of work still to do to upgrade the unstructured-data acquisition’s cross-database querying with many more use cases, from cloud data to object, streaming/sensor, data-warehouse, and IMS/Informix data types – not to mention integrating it with Master Data Management, operational-data querying needs, and features such as information governance.  And, of course, I’ve left out such newer functionality as cross-database updates, cross-database access control, developer support, and administrator support.

Informatica, apparently, is starting from behind what IBM has.  For most of the last decade, it has been playing in the EAI and “data integration” (including ETL) space, but only over the last two or three years has it publicized its “data virtualization” capabilities – nor it is clear where it got its cross-database querying chops.  Certainly, most of the smaller players from 10 years ago are already acquired, and suffering under the negligent hand of their masters – Oracle, for example, acquiring an already-neglected AquaLogic product with its buyout of BEA.  In similar fashion, SAP has wound up with a Sybase-acquired product, and Red Hat with the granddaddy of data virtualization, MetaMatrix.  In any case, large marketing claims do not substitute for a demonstrated pedigree of functional development.

Lessons For Users

So where do I view Forrester as having gone wrong, and how can IT buyers avoid buying less than the needed functionality?  I don’t know for sure, but I suspect that underlying the Forrester take was (a) confusion between EAI-type and data-virtualization-type “data integration” as well as a misunderstanding of what “data virtualization” really means, and (b) a subconscious belief that when a large and a small company say they have something, typically the large company wins because of breadth of features and support.

Let’s take the confusion first.  I am one who wonders if “data virtualization” hasn’t caused as much confusion as attention. Originally, the technology was called Enterprise Information Integration, which at least gets across the idea that the technology delivers value-add (timely, cross-data-type-contexted “information”).  “Data virtualization”, however, suggests that the main value of the technology, like that of storage and server virtualization, is to provide a single view that allows better load balancing.  On the contrary, data virtualization products also provide the basis for global metadata repositories, distributed master data management data-store query optimization, cross-the-hybrid-cloud data discovery, developer data abstraction for longer-lasting code, single-key cross-database administration, and semi-automated data governance, not to mention cross-cloud querying.  Given these additional features, users, unlike Forrester, must carefully probe whether vendors aside from Composite Software and Denodo are really walking the walk.

For the same reason, (b) doesn't apply:  you can’t simply feel comfortable with the large company’s features and support, because the features and support may very well not cover the types of things that data virtualization does well out of the box.  To put it another way, at present, IBM and Informatica have excellent and extensive data-integration and EAI features; but trying to do flexible data management, Web data discovery and ad-hoc querying, near-realtime data warehousing, and global metadata repositories for data governance without a well-optimized data virtualization product is like trying to fight with one hand tied behind one’s back. 

Data virtualization now matters more than ever to you, the IT buyer.  Forrester admits it, IBM admits it, and it seems that folks like Microsoft are now beginning to admit it.  If you don’t get 90% of the potential benefit because someone told you to use flawed criteria, you will therefore be missing out on the things that make companies like Qualcomm achieve real value-add, not just now but well into the future.  Whether I’m right about Forrester or not, the important thing is not to sell data virtualization short. Now, go out and kick those tires – the right way.

No comments: