Saturday, August 15, 2009

Eventual Consistency and Scale-Out Data Management

A recent blog post by Gordon Haff of Illuminata about new challenges to enterprise relational databases cited a white paper on how Amazon, in particular, is experimenting with loosening the typical relational requirements for ACID (atomicity, consistency, integrity, and durability). In the white paper, the author (Werner Vogels, CTO, Amazon) explains how recent database theory has begun to investigate delayed or “eventual” consistency (EC), and to apply its findings to the real world. Skimming the white paper, I realized that these findings did not seem to be applicable to all real-world situations – it appears that they are only useful in a particular type of scale-out architecture.

The problem that so-called eventual consistency techniques aim to solve is best explained by reviewing history. As distributed computing arrived in the 1980s, database theory attempted to figure out what to do when the data in a data store was itself distributed. Partitioning was an obvious solution: put item a on system A, and item b on system B, and then put processes 1 and 2 on system A (or, as in the case of Microsoft SQL Server in the 1990s, processes 1 and 2 on both A and B, so transaction streams can be multiplexed). As a result, the database can step in when either process references item b, and handle the reads and writes as if item b is really on system A. However, this is not a great general-case solution (although Sybase, for one, offers database partitioning for particular cases): optimum partitions for performance tend to change over time, and in many cases putting the same datum on 2 systems yields better parallelism and hence better performance.

The next solution – the main solution during the 1990s – was two-phase commit. Here, the idea was to make absolutely sure that processes 1 and 2 did not see different values in item a (or b) at any time. So, the “commit” first sent out instructions to lock data with multiple copies on multiple systems, then received indications from those systems that they were ready to update (Phase 1), then told them to update, and only when everyone had answered that the item had been updated (Phase 2) was the data made available for use again. This ensured consistency and the rest of the ACID properties; but when more than a few copies were involved, the performance overhead on queries was high. In theoretical terms, they had sacrificed “availability” of the multiple-copy item during the update for its consistency.

At this point, in the real world, data warehouses carved out a part of data processing in which there was no need for two-phase commit, because one copy of the data (the one in the data warehouse) could always be out of date. Replication simply streamed updates from the operational frequent-update system to the decision-support no-online-updates-allowed system in nightly bursts when the data warehouse was taken offline.

In the early 2000s, according to the white paper, theory took a new tack – seeing if some consistency could be sacrificed for availability. To put it another way, researchers noted that in some cases, when a multiple-copy update arrives, “a) it is OK to make the item available if some but not all item copies on each system have been updated (“eventual read/write”) or (b) it is OK if you use a previous data-item version until all copy updates have been completed. In case (a) you save most of Phase 2 of a corresponding two-phase commit, and in case (b), you save all of Phase 2 and most of Phase 1 as well. EC is therefore the collection of techniques to allow availability before consistency is re-established.

Where EC Fits

So, in what kinds of situations does EC help? First of all, these are situations where users need multiple data copies on multiple systems for lots of items, in order to scale . If data is on a single system, or partitioned on multiple systems, you could probably use optimistic locking or versioning to release write locks on updates (and thereby make the item available again) just as quickly. Likewise, two-phase commit involves little performance overhead in distributed systems where few multiple-copy items and few updates on these items are involved – so EC isn’t needed there, either.

A second limitation on the use of EC seems to be the rate of updates to a particular multiple-copy data item. Too frequent updates, and a state of perpetually delayed consistency would seem to result – in effect, no consistency at all.

Thus, EC does not appear appropriate for pure distributed OLTP (online transaction processing). It also does not fit pure decision support/data warehousing, where updates occur in mammoth bursts. It may be appropriate for EII or “data virtualization”-type cross-database updates mixed with querying, although I believe that real-world implementations do not involve large numbers of multiple-copy items (and hence two-phase commit will do). MDM (master data management) does not appear to be well suited to EC, as implementations typically involve updates funneled through one or two central sites, then replication of the updated item value to all other copies.

Well, then, where does EC fit? The answer seems to be, in scale-out multiple-copy distributed data architectures involving infrequent, predictably-timed updates to each item. For example, a large PC-server farm providing E-commerce to consumers may emphasize prompt response to a rapidly changing workload of customer orders, each customer record update being typically delayable until the time the customer takes to respond to a prompt for the next step in the process. In these cases, data mining across multiple customers can wait until a customer has finished, or can use the previous version of a particular customer’s data. It is therefore no surprise that Amazon would find EC useful.


If we could really implement EC in all cases, it would be a major boost to database performance, as well as to the pure scale-out architectures that otherwise seem to make less and less sense in this era when costs, energy/carbon wastage, and administrative complexity make such architectures less and less desirable. Sadly, I have to conclude, at least for now, that most traditional database use cases typically do not fit the EC model.

However, that is no reason that much more cannot be done in applying EC to “mixed” Web-associated transaction streams. These are, after all, a significant and increasing proportion of all transactional workloads. In these, EC could finally simulate true parallel data processing, rather than the concurrency which can slow really large-scale transaction-handling by orders of magnitude. As an old analysis-of-algorithms guy, I know that time parallelism can translate to exponential performance improvements as the amount of data processed approaches infinity; and if coordination between item copies is minimal, time parallelism is approximated. So EC may very well not be limited in its usefulness to large-scale E-commerce use cases, but may apply to many other use cases within large Web-dependent server farms – and cloud computing is an obvious example. I conclude that EC may not be appropriate for a wide range of today’s transactional needs; but cloud computing implementers, and major database vendors looking to support cloud computing, should “kick the tires” and consider implementing EC capabilities.

No comments: