Monday, July 13, 2009

Cloud Computing and Data Locality: Not So Fast

Cloud Computing and Data Locality: Not So Fast

In one of my favorite sleazy fantasy novels (The Belgariad, David Eddings) one of the characters is attempting to explain to another why reviving the dead is not a good idea. “You have the ability to simplify, to capture a simple visual picture of something complex [and change it],” the character says. “But don’t over-simplify. Dead is dead.”

In a recent white paper on cloud computing, in an oh-by-the-way manner, Sun mentions the idea of data locality. If I understand it correctly, virtual “environments” in a cloud may have to physically move not only from server to server, but from site to site and/or from private data center to public cloud server farm=2 0and back. More exactly, the applications don’t have to move (just their “state”), and the virtual machine software and hardware doesn’t have to move (it can be replicated or emulated in the target machine; but the data may have to be moved or copied in toto (or continue to access the same physical data store, remotely – which would violate the idea of cloud boundaries, among other problems [like security and performance]). To avoid this, it is apparently primarily up to the developer to keep in mind data locality, which seems to mean avoiding moving the data where possible by keeping it on the same physical server-farm site.

Data locality will certainly be a quick fix for immediate problems of how to create the illusion of a “virtual data center.” But is it a long-term fix? I think not. The reason, I assert, is that cloud computing is an over-simplification – physically distributed data is not virtually unified data -- and our efforts to patch it to approximate the “ideal cloud” will result in unnecessary complexity, cost, and legacy systems.

Consider the most obvious trend in the computing world in the last few years: the inexorable growth in storage of 40-60% per year, continuing despite the recession. The increase in storage reflects, at least partly, an increase in data-store size per application, or, if you wish, per “data center”. It is an increase that appears faster than Moore’s Law, and faster than the rate of increase in communications bandwidth. If moving a business-critical application’s worth of data right now from secondary to primary site for disaster-recovery purposes takes up to an hour, it is likely that moving it two years from now will take 1 ½-2 hours, and so on. Unless this trend is reversed, the idea of a data center that can move or re-partition in minutes between public and private cloud (or even between Boston and San Francisco in a private cloud) is simply unrealistic.

Of course, since the unrealistic doesn’t happen, what will probably happen is that developers will create kludges, one for each application that is20“cloud-ized”, to ensure that data is “pre-copied” and periodically “re-synchronized”, or that barriers are put in the way of data movement from site to site within the theoretically virtual public cloud. That’s the real danger – lots of “reinventing the wheel” with attendant long-term unnecessary costs of administering (and developing new code on top of) non-standardized data movements and the code propelling it, database-architecture complexity, and unexpected barriers to data movement inside the public cloud.

What ought to provide a longer-term solution, I would think, is (a) a way of slicing the data so that only the stuff needed to “keep it running” is moved – which sounds like Information Lifecycle Management (ILM), since one way of doing this is to move the most recent data, the data most likely to be accessed and updated – and (b) a standardized abstraction-layer interface to the data that enforces this. In this way, we will at least have staved off data-locality problems for a few more years, and we don’t embed kludge-type solutions in the cloud infrastructure forever.

However, I fear that such a solution will not arrive before we have created another complicated administrative nightmare. On the one hand, if data locality rules, haven’t we just created a more complicated version of SaaS (the application can’t move because the data can’t?) On the other hand, if our kludges succeed in preserving the illusion of the dynamic application/service/data-center by achieving some minimal remote data movement, how do we scale cloud server-farm sites steadily growing in data-store size by load-balancing hundreds of undocumented hard-coded differing pieces of software accessing data caches that are pretending to be exabytes of physically-local data and are actually accessing remote data during a cache miss?

A quick search of Google finds no one raising this particular point. Instead, the concerns relating to data locality seem to be about vendor lock-in, compliance with data security and privacy regulations, and the difficulty of moving the data for the first time. Another commentator notes the absence of standardized interfaces for cloud computing.

But I say, dead is dead, not alive by another name. If data is always local, that’s SaaS, not cloud by another name. And when you patch to cover up over-simplification, you create unnecessary complexity. Remember when simple-PC server farms were supposed to be an unalloyed joy, before the days of energy concerns and recession-fueled squeezes to high distributed-environment administrative IT costs? Or when avoidance of vendor lock-in was worth the added architectural complexity, before consolidation showed that it wasn’t? I wonder, when this is all over, will IT echo Oliver Hardy, and say to vendors, “Well, Stanley, here’s another fine mess you’ve gotten me into”?