Monday, April 8, 2013

IBM Information Management’s BLU Acceleration: The Beginning of a Revolution

I have now reviewed IBM’s new Big Data effort, BLU Acceleration, and my take is this:  Yes, it will deliver major performance enhancements in a wide variety of specific Big Data cases – and yes, I do view their claim of 1,000-times acceleration in some cases as credible – but the technology is not a revolutionary radical departure. Rather, it marks the evolutionary beginning of a revolutionary step in database performance and scalability that will be applicable across most Big Data apps – and data-using apps in general.

What follows is my own view of BLU Acceleration, not IBM’s. Click and Clack, the automotive repair show on NPR Radio, used to preface their shtick with “The views expressed on this show are not those of NPR …” or, basically, anyone with half a brain. Similarly, IBM may very well disagree with my key takeaways, as well as with my views on the future directions of the technology.

Still, I am sure that they would agree that a BLU Acceleration-type approach is a key element of the future direction of Big Data technology. I therefore conclude that anyone who wants to plan ahead in Big Data should at least kick the tires of solutions featuring BLU Acceleration in them, to understand the likely immediate and longer-term areas in which it may be applied. And if, in the process, some users choose to buy those solutions, I am sure IBM will be heartbroken – not.

The Rise of the Register

Database users are accustomed to thinking in terms of a storage hierarchy – main memory, sometimes solid-state devices, disk, sometimes tape -- that allows users to get 90% of the performance of an all-main-memory system at 11% of the cost. There is, however, an even higher level of “storage”:  The registers in a processor (not to mention the L1, etc. cache in that processor). There, too, the same tradeoffs apply: they operate at tens to a thousand times the speed of loading a piece of data from main memory, breaking it into parts in order to apply basic operations that amount to a transactional operation, and returning it to main memory.

The key “innovation” of BLU Acceleration is to load entire pieces of data (one or multiple columns, in compressed form) into a register and apply basic operations to it, without needing to decompress it or break it into parts. The usual parallelism between registers via single-instruction-multiple-data-stream techniques and cross-core parallelism adds to the performance advantage. In other words, the speed of the transaction is gated, not by the speed of main memory access, but by the speed of the register.

Now, this is not really revolutionary – we have seen similar approaches before, with bit-mapped indexing. There, data that could be represented as 0s and 1s, such as “yes/no” responses (effectively, a type of columnar storage), could be loaded into a register and basic “and” and “or” operations could be performed on it. The result? Up to 1,000 times speedup for transactions on those types of data. However, BLU Acceleration is able to do this on any type of data – as of now, so long as that data is represented in a columnar format.

Exploring the Virtues of Columnar “Flat Storage”

And here we come to a fascinating implication of BLU Acceleration’s ability to do register-speed processing on columnar data: it allows a columnar-format storage and database to beat an equivalent row-oriented relational storage and database over most of today’s read-only data processing – i.e., most reporting and analytics.

As of now, pre-BLU-Acceleration, there is a rule of thumb when determining when to use columnar or row-oriented relational technology in data warehousing that if more than 1 or 2 columns in a row need to be read in a large-scale transaction, row-oriented performs a little better than column-oriented. This is because any advantage in speed via increased data compression in columnar is more than counterbalanced by its need to seek back and forth across a disk for each needed column (physically stored together in row-oriented storage). However, BLU Acceleration’s shift in emphasis to registers means that the key to its performance is main memory – and main memory is “flat” storage, in which columns can be loaded into the processor simultaneously without the need to seek.

Moreover, one aspect of solid-state disk is that it is really “flat” storage (main-memory-type storage that is slower than main memory but stores the data permanently), sometimes with a disk-access “veneer” attached. In this case, the “veneer” may not be needed; and so, if everything can be stored in a gigabyte of main memory plus a terabyte of solid-state disk, BLU-Acceleration-type columnar beats or matches row-oriented just about every time.

This is especially true because now there is very little need for “indexing” – and so, BLU Acceleration claims to eliminate the complexities of indexing entirely (actually, it apparently does contain an index that gives each column a unique ID). Remember, the purpose of indexing originally in databases was to allow fast access to multiple pieces of data that were mixed and scrambled across a disk – “flat” storage has little need for these things.

A side-effect of eliminating indexing is yet more performance. Gone is the time-consuming optimizer decision-making about which index to use to generate the best performance, and the time-consuming effort to tune and retune the database indexing and storage to minimize sub-optimal performance. By the way, this also raises the question, which I will return to later, as to whether a BLU Acceleration database administrator is needed at all.

Now, there still remain, at present, limits to columnar use, and hence to BLU Acceleration’s advantages. IBM’s technology, it seems, has not yet added the “write to disk” capabilities required for decent update-heavy transactional performance. Also, in very high end applications requiring zettabytes of disk storage, it may well be that row-oriented relational approaches that avoid added disk seeks can compete – in some cases. However, it is my belief that in all cases except those, BLU-Acceleration-type columnar should perform better, and row-oriented relational is not needed.

And we should also note that BLU Acceleration has added one more piece of technology to weight the scale in columnar’s favor: column-based paging. In other words, to load from disk or disk-veneer SSD storage into main memory, one swaps in a “page” defined as containing one column – so that the speed of uploading columns is increased.

The Implications of Distributed Direct-Memory Access

It may seem odd that IBM brought a discussion of its pureScale database clustering solution into a discussion of BLU Acceleration, but to me, there’s a fundamental logic to it that has to do with high-end scalability. Clustering has always been thought of in terms of availability, not scalability – and yet, clustering continues to be the best way to scale up beyond SMP systems. But what does that have to do with BLU Acceleration?

A fundamental advance in shared-disk cluster technology came somewhere around the early ‘90s, when someone took the trouble to figure out how to load-balance across nodes. Before that, a system would simply check if an invoked application was on the node that received the invocation, and, if not, simply use a remote procedure call to invoke a defined copy of that application (or a piece of data) on another node. The load-balancing trick simply figured out which node was least used and invoked the copy of the application on that particular node. Prior to that point, clusters that added a node might see added performance equivalent to 70% of that of a standalone node. With Oracle RAC, an example of load balancing, some users reported perhaps 80% or a bit above that.

It appears that IBM pureScale, based on the mainframe’s Parallel Sysplex architecture, takes that load-balancing trick a bit further: it performs the equivalent of a “direct memory access” to the application or data on the remote node. In other words, it bypasses any network protocols (or, if the app/data is really in the node’s main memory, storage protocols) and goes directly to the application or data as if it was on the local system’s main memory. Result: IBM is talking about users seeing greater than 90% scalability – and I find at least upper 80% scalability something that many implementations may reasonably expect.

Now, let’s go back to our “flat storage” discussion. If the remote direct-memory access really does access main memory or no-veneer solid-state disk, BLU Acceleration’s columnar approach should again best row-oriented technologies, but on a much larger scale. That is, BLU Acceleration plus a pureScale cluster should see raw Big-Data performance advantages as high as the individual nodes will scale, decreasing by less than 10% times the number of nodes beyond that – and now we’re talking thousands of processors and tens of thousands of virtual machines.

And there’s another, more futuristic implication of this approach. If one can apply this kind of “distributed direct-memory access” in a clustered situation, why not in other situations, server-farm grids, for example, or scale-out within particular geographically-contiguous parts of a cloud? There is no doubt that bypassing network and storage protocols can add yet more performance to the BLU Acceleration approach – although it appears that IBM has not yet claimed or begun to implement this type of performance improvement with the technology.

The Wild BLU Yonder

And yet, I have said that BLU Acceleration is not revolutionary; it’s the beginning of a revolution. For the fact is that most of the piece parts of this new technology with mind-blowing performance have already been out there with IBM and others for some time. In-memory databases have long probed flat-storage data processing; IBM has actually seemed until now to be late to the game in columnar databases; I have already noted how bit-mapped indexing delivered thousand-fold performance improvements in certain queries a decade ago. IBM has simply been the first to put all the pieces together, and there is nothing to prevent others from following suit eventually, if they want to.

However, it is also true that IBM appears to be the first major player to deliver on this new approach, and it has a strong hand to play in evolving BLU Acceleration. And that is where the true revolution lies: in evolving this technology to add even more major performance improvements to most if not all data processing use cases. Where might these improvements lie?

One obvious extension is decoupling BLU Acceleration from its present implementation in just two database platforms – DB2, where it delivers the above-noted data warehousing and Big Data performance advantages, and Informix, where it allows an optimizer to feed appropriate time-series analyses to a separate group of servers. This, in turn, would mean ongoing adaptation to multi-vendor-database environments.

Then, there are the directions noted above: performance improvements achieved by eliminating network and storage protocols; extensions to more cases of solid-state disk “flat storage”; addition of update/insert/delete transactional capabilities to at least deliver important performance improvements for “mixed” update/query environments like Web sites; and the usual evolution of compression technology for cramming even more columns into a register.

What about the lack of indexing? Will we see no more need for database administrators? Well, from my point of view, there will be a need for non-flat storage such as disk for at least the medium term, and therefore a need to flesh out BLU Acceleration and the like with indexing schemes and optimization/tuning for disk/tape. Then, of course, there is the need for maintaining data models, schemas, and metadata managers – the subject of an interesting separate discussion at IBM’s BLU Acceleration launch event. But the bulk of present-day administrative heavy lifting may well be on its way out; and that’s a Good Thing.

There’s another potential improvement that I think should also be considered, although it sounds as if it’s not on IBM’s immediate radar screen. When that database transaction is loaded into a register, basic assembler and/or machine-code instructions like “add” and “nor” operate on it. And yet, we are talking about fairly well-defined higher-level database operations (like joins). It seems to me that identifying these higher-level operations and adding them to the machine logic might give a pretty substantial additional performance boost for the hardware/compiler vendor that wishes to tackle the process. Before, when register performance was not critical to Big Data performance, there would have been no reason to do so; now, I believe there probably is.

The IT Bottom Line

Right now, the stated use cases are to apply DB2 or Informix with BLU Acceleration to new or existing Big Data or reporting/analytics implementations – and that’s quite a range of applications. However, as noted, I think that users in general would do well to start to familiarize themselves with this technology right now.

For one thing, I see BLU Acceleration technology as evolving in the next 2-3 years to add a major performance boost to most non-OLTP enterprise solutions, not just Big Data. For another, multi-vendor-database solutions that combine BLU Acceleration columnar technology with row-oriented relational technology (and maybe Hadoop flat-file technology) are likely to be thick on the ground in 2-3 years. So IT needs to figure out how to combine the two effectively, as well as to change its database administration accordingly. By the way, these are happy decisions to make: Lots of upside, and it’s hard to find a downside, no matter how you add BLU Acceleration.

There’s a lot more to discuss about IBM’s new Big Data solutions and its strategy. For IT users, however, I view BLU Acceleration as the biggest piece of the announcement. I can’t see IBM’s technology doing anything other than delivering major value-add in more and more business-critical use cases over the next few years, whether other vendors implement it or not.

So get out there and kick those tires. Hard.


No comments: