In the past, I have been quite critical both of some Cisco
forays into the server space and and of user over-use of Hadoop. I find, however,
to my own surprise, that Cisco’s new Hadoop-using approach to data warehousing
is potentially very useful in Big Data warehouses. Here is a short thought piece as to why this
might be so.
First, a brief description of some of the key aspects of the
solution, as I see them. The Cisco approach
is to view both a traditional data warehouse and the rest of the Big Data
needed to provide fairly quick answers to business-critical data-scientist
questions as one “virtual warehouse”, with Cisco’s data virtualization solution
(based on Composite Software’s solution) as the veneer/umbrella. Once you view all of these piece parts as
part of a data-warehouse whole, it becomes possible to use not only lower-cost
storage for “less-used” Big Data, but also different databases, including
access to operational OLTP data stores and “mixed” query/update enterprise-app
data stores. These, however, can
traditionally handle queries on much smaller data stores, because of their dual
purpose and competition from updates. Even master-data-management systems,
because it can be too constraining to rigidly copy to a central data store,
suffer from this type of dual-purpose limitation.
The potential of a Hadoop database as a kind of “overload”
locus, it seems to me, is that one takes a database optimized for querying data
that is so Big that relational approaches alone cannot process it, and use it
as “overflow” space for handling data that is so Big that a traditional data
warehouse cannot handle it. A potential
side benefit is that, these days, much of the massive “overflow” data may very
well be social-media information – the type of information on which Hadoop,
MapReduce, and Hive cut their teeth. And,
of course, however inefficient in-house Hadoop has been, here at least is one
area in which IT Hadoop experience allows better optimization of the Hadoop
side of the virtual data warehouse.
Likewise, I am prepared to cut Cisco some slack when it says
it intends to use its scale-out UCS servers in Hadoop “clusters”. Despite the hype, it appears that no
scale-out solution is coming close to the cost efficiency of scale-up servers
in either public or private clouds, but if you’re going to go the scale-out
route, UCS servers don’t stick out as especially cost-ineffective, and they
have the benefit of Cisco’s networking strengths in their clustering.
Above all, Cisco’s solution is nice because it adds a major
new option to the information architecture.
When that has happened before, savvy users have usually found a way to
make it work for their needs better than the old set of choices. Again, I say to my surprise – check out Cisco’s
new Hadoop-using approach to data warehousing.
I believe it’s worth a close look.