I have now
reviewed IBM’s new Big Data effort, BLU Acceleration, and my take is this: Yes, it will deliver major performance
enhancements in a wide variety of specific Big Data cases – and yes, I do view
their claim of 1,000-times acceleration in some cases as credible – but the
technology is not a revolutionary radical departure. Rather, it marks the
evolutionary beginning of a revolutionary step in database performance
and scalability that will be applicable across most Big Data apps – and
data-using apps in general.
What follows
is my own view of BLU Acceleration, not IBM’s. Click and Clack, the automotive
repair show on NPR Radio, used to preface their shtick with “The views
expressed on this show are not those of NPR …” or, basically, anyone with half
a brain. Similarly, IBM may very well disagree with my key takeaways, as well
as with my views on the future directions of the technology.
Still, I am
sure that they would agree that a BLU Acceleration-type approach is a key
element of the future direction of Big Data technology. I therefore conclude
that anyone who wants to plan ahead in Big Data should at least kick the tires
of solutions featuring BLU Acceleration in them, to understand the likely
immediate and longer-term areas in which it may be applied. And if, in the
process, some users choose to buy those solutions, I am sure IBM will be
heartbroken – not.
The Rise of the Register
Database users
are accustomed to thinking in terms of a storage hierarchy – main memory,
sometimes solid-state devices, disk, sometimes tape -- that allows users to get
90% of the performance of an all-main-memory system at 11% of the cost. There
is, however, an even higher level of “storage”:
The registers in a processor (not to mention the L1, etc. cache in that
processor). There, too, the same tradeoffs apply: they operate at tens to a
thousand times the speed of loading a piece of data from main memory, breaking
it into parts in order to apply basic operations that amount to a transactional
operation, and returning it to main memory.
The key
“innovation” of BLU Acceleration is to load entire pieces of data (one or
multiple columns, in compressed form) into a register and apply basic
operations to it, without needing to decompress it or break it into parts. The
usual parallelism between registers via single-instruction-multiple-data-stream
techniques and cross-core parallelism adds to the performance advantage. In
other words, the speed of the transaction is gated, not by the speed of main
memory access, but by the speed of the register.
Now, this is
not really revolutionary – we have seen similar approaches before, with
bit-mapped indexing. There, data that could be represented as 0s and 1s, such
as “yes/no” responses (effectively, a type of columnar storage), could be
loaded into a register and basic “and” and “or” operations could be performed
on it. The result? Up to 1,000 times speedup for transactions on those types of
data. However, BLU Acceleration is able to do this on any type of data – as of
now, so long as that data is represented in a columnar format.
Exploring the Virtues of Columnar “Flat Storage”
And here we
come to a fascinating implication of BLU Acceleration’s ability to do
register-speed processing on columnar data: it allows a columnar-format storage
and database to beat an equivalent row-oriented relational storage and database
over most of today’s read-only data processing – i.e., most reporting and
analytics.
As of now, pre-BLU-Acceleration,
there is a rule of thumb when determining when to use columnar or row-oriented
relational technology in data warehousing that if more than 1 or 2 columns in a
row need to be read in a large-scale transaction, row-oriented performs a
little better than column-oriented. This is because any advantage in speed via
increased data compression in columnar is more than counterbalanced by its need
to seek back and forth across a disk for each needed column (physically stored
together in row-oriented storage). However, BLU Acceleration’s shift in
emphasis to registers means that the key to its performance is main memory –
and main memory is “flat” storage, in which columns can be loaded into the
processor simultaneously without the need to seek.
Moreover, one
aspect of solid-state disk is that it is really “flat” storage
(main-memory-type storage that is slower than main memory but stores the data
permanently), sometimes with a disk-access “veneer” attached. In this case, the
“veneer” may not be needed; and so, if everything can be stored in a gigabyte
of main memory plus a terabyte of solid-state disk, BLU-Acceleration-type columnar
beats or matches row-oriented just about every time.
This is
especially true because now there is very little need for “indexing” – and so,
BLU Acceleration claims to eliminate the complexities of indexing entirely
(actually, it apparently does contain an index that gives each column a unique
ID). Remember, the purpose of indexing originally in databases was to allow
fast access to multiple pieces of data that were mixed and scrambled across a
disk – “flat” storage has little need for these things.
A side-effect
of eliminating indexing is yet more performance. Gone is the time-consuming optimizer
decision-making about which index to use to generate the best performance, and
the time-consuming effort to tune and retune the database indexing and storage
to minimize sub-optimal performance. By the way, this also raises the question,
which I will return to later, as to whether a BLU Acceleration database
administrator is needed at all.
Now, there
still remain, at present, limits to columnar use, and hence to BLU
Acceleration’s advantages. IBM’s technology, it seems, has not yet added the
“write to disk” capabilities required for decent update-heavy transactional
performance. Also, in very high end applications requiring zettabytes of disk
storage, it may well be that row-oriented relational approaches that avoid
added disk seeks can compete – in some cases. However, it is my belief that in all
cases except those, BLU-Acceleration-type columnar should perform better, and
row-oriented relational is not needed.
And we should
also note that BLU Acceleration has added one more piece of technology to
weight the scale in columnar’s favor: column-based paging. In other words, to
load from disk or disk-veneer SSD storage into main memory, one swaps in a
“page” defined as containing one column – so that the speed of uploading
columns is increased.
The Implications of Distributed Direct-Memory
Access
It may seem
odd that IBM brought a discussion of its pureScale database clustering solution
into a discussion of BLU Acceleration, but to me, there’s a fundamental logic
to it that has to do with high-end scalability. Clustering has always been
thought of in terms of availability, not scalability – and yet, clustering
continues to be the best way to scale up beyond SMP systems. But what does that
have to do with BLU Acceleration?
A fundamental
advance in shared-disk cluster technology came somewhere around the early ‘90s,
when someone took the trouble to figure out how to load-balance across nodes.
Before that, a system would simply check if an invoked application was on the
node that received the invocation, and, if not, simply use a remote procedure
call to invoke a defined copy of that application (or a piece of data) on
another node. The load-balancing trick simply figured out which node was least
used and invoked the copy of the application on that particular node. Prior to
that point, clusters that added a node might see added performance equivalent to
70% of that of a standalone node. With Oracle RAC, an example of load
balancing, some users reported perhaps 80% or a bit above that.
It appears
that IBM pureScale, based on the mainframe’s Parallel Sysplex architecture,
takes that load-balancing trick a bit further: it performs the equivalent of a
“direct memory access” to the application or data on the remote node. In other
words, it bypasses any network protocols (or, if the app/data is really in the
node’s main memory, storage protocols) and goes directly to the application or
data as if it was on the local system’s main memory. Result: IBM is talking
about users seeing greater than 90% scalability – and I find at least upper 80%
scalability something that many implementations may reasonably expect.
Now, let’s go
back to our “flat storage” discussion. If the remote direct-memory access
really does access main memory or no-veneer solid-state disk, BLU
Acceleration’s columnar approach should again best row-oriented technologies,
but on a much larger scale. That is, BLU Acceleration plus a pureScale cluster
should see raw Big-Data performance advantages as high as the individual nodes
will scale, decreasing by less than 10% times the number of nodes beyond that –
and now we’re talking thousands of processors and tens of thousands of virtual
machines.
And there’s
another, more futuristic implication of this approach. If one can apply this
kind of “distributed direct-memory access” in a clustered situation, why not in
other situations, server-farm grids, for example, or scale-out within
particular geographically-contiguous parts of a cloud? There is no doubt that
bypassing network and storage protocols can add yet more performance to the BLU
Acceleration approach – although it appears that IBM has not yet claimed or
begun to implement this type of performance improvement with the technology.
The Wild BLU Yonder
And yet, I
have said that BLU Acceleration is not revolutionary; it’s the beginning of a
revolution. For the fact is that most of the piece parts of this new technology
with mind-blowing performance have already been out there with IBM and others
for some time. In-memory databases have long probed flat-storage data
processing; IBM has actually seemed until now to be late to the game in
columnar databases; I have already noted how bit-mapped indexing delivered
thousand-fold performance improvements in certain queries a decade ago. IBM has
simply been the first to put all the pieces together, and there is nothing to
prevent others from following suit eventually, if they want to.
However, it is
also true that IBM appears to be the first major player to deliver on this new
approach, and it has a strong hand to play in evolving BLU Acceleration. And
that is where the true revolution lies: in evolving this technology to add even
more major performance improvements to most if not all data processing use
cases. Where might these improvements lie?
One obvious
extension is decoupling BLU Acceleration from its present implementation in
just two database platforms – DB2, where it delivers the above-noted data
warehousing and Big Data performance advantages, and Informix, where it allows
an optimizer to feed appropriate time-series analyses to a separate group of
servers. This, in turn, would mean ongoing adaptation to multi-vendor-database
environments.
Then, there
are the directions noted above: performance improvements achieved by
eliminating network and storage protocols; extensions to more cases of
solid-state disk “flat storage”; addition of update/insert/delete transactional
capabilities to at least deliver important performance improvements for “mixed”
update/query environments like Web sites; and the usual evolution of
compression technology for cramming even more columns into a register.
What about the
lack of indexing? Will we see no more need for database administrators? Well,
from my point of view, there will be a need for non-flat storage such as disk
for at least the medium term, and therefore a need to flesh out BLU
Acceleration and the like with indexing schemes and optimization/tuning for
disk/tape. Then, of course, there is the need for maintaining data models,
schemas, and metadata managers – the subject of an interesting separate
discussion at IBM’s BLU Acceleration launch event. But the bulk of present-day
administrative heavy lifting may well be on its way out; and that’s a Good
Thing.
There’s
another potential improvement that I think should also be considered, although
it sounds as if it’s not on IBM’s immediate radar screen. When that database
transaction is loaded into a register, basic assembler and/or machine-code
instructions like “add” and “nor” operate on it. And yet, we are talking about
fairly well-defined higher-level database operations (like joins). It seems to
me that identifying these higher-level operations and adding them to the
machine logic might give a pretty substantial additional performance boost for
the hardware/compiler vendor that wishes to tackle the process. Before, when
register performance was not critical to Big Data performance, there would have
been no reason to do so; now, I believe there probably is.
The IT Bottom Line
Right now, the
stated use cases are to apply DB2 or Informix with BLU Acceleration to new or
existing Big Data or reporting/analytics implementations – and that’s quite a
range of applications. However, as noted, I think that users in general would
do well to start to familiarize themselves with this technology right now.
For one thing,
I see BLU Acceleration technology as evolving in the next 2-3 years to add a
major performance boost to most non-OLTP enterprise solutions, not just Big
Data. For another, multi-vendor-database solutions that combine BLU
Acceleration columnar technology with row-oriented relational technology (and
maybe Hadoop flat-file technology) are likely to be thick on the ground in 2-3
years. So IT needs to figure out how to combine the two effectively, as well as
to change its database administration accordingly. By the way, these are happy
decisions to make: Lots of upside, and it’s hard to find a downside, no matter
how you add BLU Acceleration.
There’s a lot
more to discuss about IBM’s new Big Data solutions and its strategy. For IT
users, however, I view BLU Acceleration as the biggest piece of the
announcement. I can’t see IBM’s technology doing anything other than delivering
major value-add in more and more business-critical use cases over the next few
years, whether other vendors implement it or not.
So get out
there and kick those tires. Hard.
No comments:
Post a Comment