In the past, I have been quite critical both of some Cisco forays into the server space and and of user over-use of Hadoop. I find, however, to my own surprise, that Cisco’s new Hadoop-using approach to data warehousing is potentially very useful in Big Data warehouses. Here is a short thought piece as to why this might be so.
First, a brief description of some of the key aspects of the solution, as I see them. The Cisco approach is to view both a traditional data warehouse and the rest of the Big Data needed to provide fairly quick answers to business-critical data-scientist questions as one “virtual warehouse”, with Cisco’s data virtualization solution (based on Composite Software’s solution) as the veneer/umbrella. Once you view all of these piece parts as part of a data-warehouse whole, it becomes possible to use not only lower-cost storage for “less-used” Big Data, but also different databases, including access to operational OLTP data stores and “mixed” query/update enterprise-app data stores. These, however, can traditionally handle queries on much smaller data stores, because of their dual purpose and competition from updates. Even master-data-management systems, because it can be too constraining to rigidly copy to a central data store, suffer from this type of dual-purpose limitation.
The potential of a Hadoop database as a kind of “overload” locus, it seems to me, is that one takes a database optimized for querying data that is so Big that relational approaches alone cannot process it, and use it as “overflow” space for handling data that is so Big that a traditional data warehouse cannot handle it. A potential side benefit is that, these days, much of the massive “overflow” data may very well be social-media information – the type of information on which Hadoop, MapReduce, and Hive cut their teeth. And, of course, however inefficient in-house Hadoop has been, here at least is one area in which IT Hadoop experience allows better optimization of the Hadoop side of the virtual data warehouse.
Likewise, I am prepared to cut Cisco some slack when it says it intends to use its scale-out UCS servers in Hadoop “clusters”. Despite the hype, it appears that no scale-out solution is coming close to the cost efficiency of scale-up servers in either public or private clouds, but if you’re going to go the scale-out route, UCS servers don’t stick out as especially cost-ineffective, and they have the benefit of Cisco’s networking strengths in their clustering.
Above all, Cisco’s solution is nice because it adds a major new option to the information architecture. When that has happened before, savvy users have usually found a way to make it work for their needs better than the old set of choices. Again, I say to my surprise – check out Cisco’s new Hadoop-using approach to data warehousing. I believe it’s worth a close look.