Wednesday, October 2, 2013

Composite Software, Cisco, and the Potential of Web Data in Motion

The long-term customer benefits of the acquisition of Composite Software, one of the pre-eminent data virtualization vendors, by Cisco, long known primarily for its communications prowess, aren’t obvious at first sight – but I believe that in one area, there is indeed major potential for highly useful new technology. Specifically, I believe that Cisco is well positioned to use Composite Software to handle event-driven processing of “data in motion” over the Web.

Why should this matter to the average IT person? Let’s start with the fact that enormous amounts of data (Big Data, especially social-media data) passes between smartphone/tablet/computer and computer on a minute-by-minute and second-by-second basis on the Internet – effectively, outside of corporate boundaries and firewalls. This data is typically user data; unlike much of corporate data, it is semi-structured (text) or unstructured (graphics, audio, video, pictures) or “mixed”. In fact, the key to this data is that it is not only unusually large-chunk but also unusually variant in type: what passes over the Internet at any one time is not only a mix of images and text, but also a mix that also changes from second to second.

Up to now, customers have been content with an arrangement in which much of the data eventually winds up in huge repositories in large server farms at public cloud provider facilities. In turn, enterprises dip into these repositories via Hadoop or mass downloads. The inevitable delays in data access inherent in such arrangements are seen as much less important than the improvements in social-data and Big-Data access that such an architecture provides.

Now suppose that we could add an “event processor” to “strain”, redirect, and preliminarily interpret this data well before it arrives at a repository, much less before the remote, over-stressed repository finally delivers the data to the enterprise. It would not replace the public cloud repository; but it would provide a clear alternative for a wide swath of cases with far superior information delivery speed.
This would be especially valuable for what I have called “sensor” data. This is the mass of smartphone pictures and video that reports a news event, or the satellite and GPS data that captures the locations and movement of people and packages in real time. From this, the event processor could distil and deliver alerts of risks and buying-pattern changes, key changes on a daily or hourly basis of the rhythms of daily commerce and customer preferences beyond those typically viewed by the enterprise itself, and opportunities available to fast responders.
Does such an event processor exist now? No, and that’s the point. To fulfill its full potential, that event processor would need to be (1) just about ubiquitous, (2) highly performant, and (3) able to analyze disparate data effectively. No event processor out there truly meets any but the second of these requirements.

"It … Could … Be … Done!”

Those old enough to remember recognize these words from Mel Brooks’ movie Young Frankenstein, when the hero is shocked to recognize that his father’s work was not, in fact, as he had put it, “complete doo-doo.” My point in echoing them here is to say that, in fact, the combination of Cisco and Composite Software is surprisingly close to fulfilling all the of the requirements cited above.

Let’s start with “just about ubiquitous.” As regards “data in motion”, Cisco with its routers fills the bill as well as anyone. Of course, event processors on each router would need to be coordinated (that is, one would prefer not to send spurious alerts when data flowing over an alternate route and reunited at the destination might cause us to say “oops, never mind”). However, both Cisco and Composite Software have a great deal of experience in handling in a coordinated fashion, in parallel, multiple streams of data. We do not have to achieve data integrity across millions of routers, merely local coordination centers that adequately combine the data into a composite picture (pardon the pun) – which Composite Software is well experienced in doing.

How about “able to analyze disparate data fast”? Here is where Composite Software really shines, with its multi-decade fine-tuning of cross-data-type distributed data analysis. Better than most if not all conventional databases, Composite Server provides a “database veneer” that offers transparent performance optimization of distributed data access over all the data types prevalent both in the enterprise and on the Internet.

It is indeed, the “highly performant” criterion where Composite Software plus Cisco is most questionable right now. Neither Composite Server nor Cisco’s pre-existing software was designed to handle event processing as we know it today. However, it could be said that today’s event processors conceptually could be split into two parts: (a) a pre-processor that makes initial decisions that don’t require much cross-data analysis, and (b) a conventional database that uses a “cache” data store (still in almost real time) for deeper analysis before the final action is taken. Composite Server probably can handle (b) with some cross-router or cross-machine processing thrown in, while a conventional event processor could be inserted to handle (a).

The IT Bottom Line: Making Do Until Nirvana

Is there nothing that can be done, then, except wait and hope that Composite Software and Cisco recognize the opportunity and fill in the pieces, or some other vendor spends the time to reproduce what they already have? Actually, I think there may be. It’s not the long-term solution; but it mimics to some extent a ubiquitous Web event processor.

I am talking about setting up Composite Software as a front end rather than a back end to public cloud provider databases. A simple multiplexer could “strain” and feed data to multiple data stores using multiple conventional operational databases for the constant stream of updates, as well as to backend Hadoop/MapReduce file systems and traditional databases. Composite Server would then carry out queries across these “data type specialists”, in much the same way it operates now. The main difference between this approach and what is happening now is that Composite Software will get a much smaller subset of provider data at the same time as the file system – and hence will at least deliver alerts on some key “sensor” data well ahead of the stressed-out Big-Data Hadoop data store.

My suggested action item for IT, therefore, is to start conceptualizing such a means of handling Web “data in motion,” and possibly to set up a Composite-Server testbed, to lead on to implementation of an interim solution. I would also appreciate it if IT would gently indicate to Cisco that they would find a full-fledged solution highly desirable. A Web “data in motion” event processor would be a Big new extension of Big Data, with Big benefits, and it seems to me that Composite Software and Cisco are best positioned to make such a solution available sooner rather than later.

It … could … be … done! Let’s … do … it … now!

No comments: