Thoughts From a Retired Software IT Analyst

Tuesday, November 3, 2009

Dave Hill, Data Protection, and the Future of IT

Full disclosure: Dave is a friend, and a long-time colleague. He has just written an excellent book on Data Protection; hence the following musings.

As I was reading (a rapid first scan), I tried to pin down why I liked the book so much. It certainly wasn’t the editing, since I helped with that. The topic is reasonably well covered, albeit piecemeal, by vendors, industry associations, and bloggers. And while I have always enjoyed Dave’s stories and jokes, the topic does not lend itself to elaborate stylistic flourishes.

After thinking about it some more, I came to the conclusion that it’s Dave’s methodology that I value. Imho, Dave in each chapter will lay out a comprehensive and innovative classification of the topic at hand – data governance, information lifecycle management, data security – and then use that classification to bring new insight into a well-covered topic. The reason I like this approach is that it allows you to use the classification as a springboard, to come to your own conclusions, to extend the classification and apply it in other areas. In short, I found myself continually translating classifications from the narrow world of storage to the broad world of “information”, and being enlightened thereby.

One area in particular that called forth this type of analysis was the topic of cloud computing and storage. If data protection, more or less, involves considerations of compliance, operational/disaster recovery, and security, how do these translate to a cloud external to the enterprise? And what is the role of IT in data protection when both physical and logical information are now outside of IT’s direct control?

But this is merely a small part of the overall question of the future of IT, if external clouds take over large chunks of enterprise software/hardware. If the cloud can do it all cheaper, because of economies of scale, what justification is there for IT to exist any longer? Or will IT become “meta-IT”, applying enterprise-specific risk management, data protection, compliance, and security to their own logical part of a remote, multi-tenant physical infrastructure?

I would suggest another way of slicing things. It is reasonable to think of a business, and hence underlying IT, as cost centers, which benefit from commodity solutions provided externally, and competitive-advantage or profit centers, for which being like everything else is actually counter-productive. In an ideal world, where the cloud can always underprice commodity hardware and software, IT’s value-add lies where things are not yet commodities. In other words, in the long run, IT should be the “cache”, the leading edge, the driver of the computing side of competitive advantage.

What does that mean, practically? It means that the weight of IT should shift much more towards software and product development and initial use. IT’s product-related and innovative-process-related software and the systems to test and deploy them are IT’s purview; the rest should be in the cloud. But this does not make IT less important; on the contrary, it makes IT more important, because not only does IT focus on competitive advantage when things are going well, it also focuses on agile solutions that pay off in cost savings by more rapid adaptation when things are going poorly. JIT inventory management is a competitive advantage when orders are rising; but also a cost saver when orders are falling.

I realize that this future is not likely to arrive any time soon. The problem is that in today’s IT, maintenance costs crowd out new-software spending, so that the CEO is convinced that IT is not competent to handle software development. But let’s face it, no one else is, either. Anyone following NPD (new product development) over the last few years realizes that software is an increasing component in an increasing number of industries. Outsourcing competitive-advantage software development is therefore increasingly like outsourcing R&D – it simply doesn’t work unless the key overall direction is in-house. Whether or not IT does infrastructure governance in the long run, it is necessarily the best candidate to do NPD software-development governance.

So I do believe that IT has a future; but quite a different one from its present. As you can see, I have wandered far afield from Data Protection, thanks to Dave Hill’s thought-provoking book. The savvy reader of this tome will, I have no doubt, be able to come up with other, equally fascinating thoughts.

Monday, October 19, 2009

Speed vs. Agility

A recent product announcement by IBM and a series of excellent (or at least interesting) articles in Sloan Management Review have set me to musing on one unexamined assumption in most assessments: that increased process speed equals increased business agility. My initial take: this is true in most cases, but not in all, and can be misleading as a cookie-cutter strategy.

The IBM announcement centered around integration of their business-process management (BPM) capabilities, in order to achieve agility by speeding up business processes. What was notably missing was integration with IBM’s capabilities for New Product Development (NPD) – Rational and the like. However, my initial definition and application of KAIs (key agility indicators) at a business level suggests that speeding up NPD, including development of new business processes, has far more of an impact on long-term business agility than speeding up existing processes. To put it another way, increasing the Titanic’s ability to turn sharply is far more likely to avert disaster than increasing its top speed charging straight ahead – in fact, increasing its speed makes it more likely to crash into an iceberg.

A similar assumption seems to have been made in SMR’s latest issue, in the article entitled “Which Innovation Efforts Will Pay?” The message of this article appears to be that improving innovation efforts is primarily a matter of focusing more on the “healthy innovation” middle region of internally-developed modest “base hits”, with little or no effect from speeding up internal innovation processes or expanding them to include outside innovation. By contrast, the article “Does IP Strategy Have to Cripple Open Innovation?” suggests that collaborative strategies across organizational lines focused on NPD make users far more agile and businesses far better off, despite requiring as much (or more) time to implement as in-house efforts. And finally, we might cite a study in SMR suggesting that users estimating inventory-refill needs were more likely to make sub-optimal decisions when fed data daily than when fed a weekly summary, or the recent book on system dynamics by Donella Meadows that argued that increasing the speed of a process was often accomplished by increasing its rigidity (constraining the process in order to optimize the typical case right now), which made future disasters, as the system inevitably grows, less avoidable and more life-threatening.

All of this suggests that (a) people are assuming that increased process speed automatically translates to increased business agility, and (b) on the contrary, in many cases it translates to insignificant improvements or significant decreases in agility. But how do we tell when speed equals agility, and when not? What rules of thumb are there to tell us when increased speed does not positively impact business agility, and, in those cases, what should we do?

I don’t pretend to have the final answers for these questions. But I do have some initial thoughts on typical situations that lead to increased speed but decreased agility, and on how to assess and improve business investment strategies in those cases.

If it isn’t sustainable it isn’t agile. Let us suppose that we improve a business process by applying technology that speeds the process by decreasing the need for human resources. Further, suppose that the technology involves increased carbon or energy use – a 1/1 replacement of people-hours by computing power, say. Over the long run, this increased energy use will need to be dealt with, adding costs for future business-process redesign and decreasing the money available for future innovation. The obvious rejoinder is that cost savings will fund those future costs; except that today, most organizations are still digging themselves deeper into an energy hole, while operational IT costs, driven by an increased need for storage, continue to exert upward pressure and crowd out new-app development.

As the most recent SMR issue notes, the way to handle this problem is to build sustainability into every business process. If lack of sustainability decreases agility, then the converse is also true: building sustainability into the company, including both ongoing processes and NPD, increases revenues, decreases costs – and increases agility.

If it’s less open it’s less agile. In some ways, this is a tautology: if an organization changes a business process so as to preclude some key inputs in the name of speed, it will be less successful in identifying problems that call for adaptation. However, it does get at one of the subtler causes of organizational rigidity: the need to do something, anything, quickly in order to survive. A new online banking feature may make check processing much more rapid, but if customers are not listened to adequately, it may be rejected in the marketplace, or cost the business market share.

Detecting and correcting this type of problem is hard, because organizational politics points everyone to the lesson only after the process has already been implemented and has gained a toehold – and because businesses may draw the wrong conclusion (e.g., it’s about better design, not about ensuring open collaboration). The best fix is probably a strong, consistent, from-the-top emphasis on collaboration, agile processes, and not shooting the messenger.

Those are the main ways I have seen in which increased speed can actually make things worse. I want to add one more suggestion, which affects not so much situations where speed has negative consequences, but rather cases in which speed and agility can be improved more cost-effectively: upgrading the development process is better. That is, even if you are redesigning an existing business process like disaster recovery for better speed, you get a better long-term bang for your buck by also improving the agility of the process by which you create and implement the speedier solution. Not only does this have an immediate impact in making the solution itself more agile; it also bleeds over into the next project to improve a business process, or a product or service. And the best way I’ve found so far to improve development-process speed, quality, and effectiveness is an agile process.

Sunday, September 6, 2009

Open Source, Windows, Portability, and Borges

There is a wonderful short story by Jorge Luis Borges ("Pierre Menard, Author of the Quixote") that, I believe, captures the open source effort to come to terms with Windows – which in some quarters is viewed as the antithesis of the philosophy of open source. In this short story, a critic analyzes Don Quixote as written by someone four hundred years later – someone who has attempted to live his life so as to be able to write the exact same words as in the original Don Quixote. The critic’s point is that even though the author is using the same words, today they mean something completely different.

In much the same way, open source has attempted to mimic Windows on “Unix-like” environments (various flavors of Unix and Linux) without triggering Microsoft’s protection of its prize operating system. To do this, they have set up efforts such as Wine and ReactOS (to provide the APIs of Windows from Win2K onwards) and Mono (to provide the .NET APIs). These efforts attempt to support the same APIs as Microsoft’s, but with no knowledge of how Microsoft created them. This is not really reverse engineering, as the aim of reverse engineering is usually to figure out how functionality was achieved. These efforts don’t care how the functionality was achieved – they just want to provide the same collection of words (the APIs and functionality).

But while the APIs are the same, the meaning of the effort has changed in the twenty-odd years since people began asking how to make moving programs from Wintel to another platform (and vice versa) as easy as possible. Then, every platform had difficulties with porting, migration, and source or binary compatibility. Now, Wintel and the mainframe, among the primary installed bases, are the platforms that are most difficult to move to or from. Moreover, the Web, or any network, as a distinct platform did not exist; today, the Web is increasingly a place in which every app and most middleware must find a way to run. So imitating Windows is no longer so much about moving Windows applications to cheaper or better platforms; it is about reducing the main remaining barrier to being able to move any app or software from any platform to any other, and into “clouds” that may hide the underlying hardware, but will still suffer when apps are platform-specific.

Now, “moving” apps and “easy” are very vague terms. My own hierarchy of ease of movement from place to place begins with real-time portability. That is, a “virtual machine” on any platform can run the app, without significant effects on app performance, robustness, and usability (i.e., the user interface allows you to do the same things). Real-time portability means the best performance for the app via load balancing and dynamic repartitioning. Java apps are pretty much there today. However, apps in other programming languages are not so lucky, nor are legacy apps.

The next step down from real-time portability is binary compatibility. The app may not work very well when moved in real time from one platform to another, but it will work, without needing changes or recompilation. That’s why forward and backward compatibility matter: they allow the same app to work on earlier or later versions of a platform. As time goes on, binary compatibility gets closer and closer to real-time portability, as platforms adapt to be able to handle similar workloads. Windows Server may not scale as well as the mainframe, but they both can handle the large majority of Unix-like workloads. It is surprising how few platforms have full binary compatibility with all the other platforms; it isn’t just Windows to the mainframe but also compatibility between different versions of Unix and Linux. So we are a ways away from binary compatibility, as well.

The next step down is source-code compatibility. This means that in order to run on another platform, you can use the same source code, but it must be recompiled. In other words, source-code but not binary compatibility seems to rule out real-time movement of apps between platforms. However, it does allow applications to generate a version for each platform, and then interoperate/load balance between those versions; so we can crudely approximate real-time portability in the real world. Now we are talking about a large proportion of apps on Unix-like environments (although not all), but Windows and mainframe apps are typically not source-code compatible with the other two environments. Still, this explains why users can move Linux apps onto the mainframe with relative ease.

There’s yet another step down: partial compatibility. This seems to come in two flavors: higher-level compatibility (that is, source-code compatibility if the app is written to a higher-level middleware interface such as .NET) and “80-20” compatibility (that is, 80% of apps are source-code incompatible in only a few, easily modified places; the other 20% are the nasty problems). Together, these two cases comprise a large proportion of all apps; and it may be comforting to think that legacy apps will sunset themselves so that eventually higher-level compatibility will become de facto source-code compatibility. However, the remaining cases include many important Windows apps and most mission- and business-critical mainframe apps. To most large enterprises, partial compatibility is not an answer. And so we come to the final step down: pure incompatibility, only cured by a massive portation/rewrite effort that has become much easier but is still not feasible for most such legacy apps.

Why does all this matter? Because we are closer to Nirvana than we realize. If we can imitate enough of Windows on Linux, we can move most Windows apps to scale-up servers when needed (Unix/Linux or mainframe). So we will have achieved source-code compatibility from Windows to Linux, Java real-time portability from Linux to Windows, source-code compatibility for most Windows apps from Windows to Linux on the mainframe, and Linux source-code compatibility and Java real-time portability from Linux to the mainframe and back. It would be nice to have portability from z/OS apps to Linux and Windows platforms; but neither large enterprises nor cloud vendors really need this – the mainframe has that strong a TCO/ROI and energy-savings story for large-scale and numerous (say, more than 20 apps) situations.

So, in an irony that Borges might appreciate, open-source efforts may indeed allow lower costs and greater openness for Windows apps; but not because open source free software will crowd out Windows. Rather, a decent approximation of cross-platform portability with lower per-app costs will be achieved because these efforts allow users to leverage Windows apps on other platforms, where the old proprietary vendors could never figure out how to do it. The meaning of the effort may be different than it would have been 15 years ago; but the result will be far more valuable. Or, as Borges’ critic might say, the new meaning speaks far more to people today than the old. Sometimes, Don Quixote tilting at windmills is a useful thing.

Thursday, August 27, 2009

On Julia Child

While due attention is being paid to Ted Kennedy this week, a fair amount of discussion of the movie Julie and Julia is also taking place, primarily centered around the profound impact that Julia Child has had on many people. While it is great to see Julia getting her fair share of praise, I do disagree with many commentators about Julia’s significance. In particular, I think that she was part of a broader happy trend toward higher-quality cooking in the US, and that her superb recipes or TV shows are less important to that trend than the fact that, at long last, chefs seem to have come on their own to the conclusion that the French philosophy of cooking really does work.

Here’s my story: in 1966, when I was 16, I spent a summer in France – in Paris, in Brittany, on the Loire , and in Normandy. On a previous trip, when I was 9, my main object was to get a hamburger (steak tartare, anyone?). On this trip, my parents let me go out to restaurants frequently, and typically one-, two-, and three-star ones (the Michelin Guide’s ratings of quality, for those who don’t know).

The trip was an eye-opener. The food was always clearly different, and consistently to be savored. I learned that mushrooms were not a gritty, flavorless component of Campbell’s mushroom soup; that sauces were not a variant of ketchup to drench poor-tasting food in; that, when prepared well, fish tasted better than steak; that bread was not the most important part of the meal; and that less liquid rather than more (wine instead of Coke) enhanced taste.

Above all, I learned that during a really good meal, unconsciously, instead of eating fast to satisfy hunger, I ate slowly to enjoy food’s taste. I should note that when we took my son to a three-star French restaurant at the same age, much the same thing happened, and he stopped saying “who cares about food” and started saying “I like this place, I don’t like this one.”

The reasons why French cooking was so far superior to American, in those days, I believe were these:

1. The French insisted the raw materials for food had to be absolutely fresh. Seafood in Paris was typically a half-day from the sea or less.

2. There were specific standards for ingredients. There was a certain kind of starter for French bread, certain types required for vegetables, and, say, margarine was not substituted for butter because it was cheaper.

3. The emphasis was on just enough cooking, rather than overcooking. This was especially true for fish.

4. Meals were planned so that each piece tasted good on its own and also in concert with something else. This made for a certain simplicity: instead of stews and sandwiches, you had delicately flavored fish paired with excellent sauces, plus “al dente” vegetables and sugary but light desserts.

Now, remember, I had been to what were supposed to be excellent French (and other) restaurants in New York City. These French restaurants, even out in the boonies, were in every case far better.

Over the next few years, as my mother watched Julia on TV and tried some of her recipes, and I used her book to cook for special occasions, I consistently found that neither home cooking nor French (or other) restaurants in the US measured up to the French ones. There were a few dishes that were exceptions: I remember a green-bean-with-garlic dish in NYC in 1980 that was worthy of a one-star restaurant in Paris. But until about the last 10 years, despite the enormous shift in American taste 1960-1990 from “HoJo American” to “foreign experimental”, American restaurants of all stripes simply never gave me that feeling of wanting to eat slowly.

I may be reading too much into the situation, but I think the turning point came when chefs finally began to adopt the French philosophies cited above. In other words, they started trying to standardize and improve the quality of ingredients; they gave due attention to making each piece of the meal flavorful, and to sauces; and they started emphasizing freshness.

Why couldn’t our attempts to do Julia match the French? I believe, because the ingredients simply weren’t good enough. Comparable cuts of beef weren’t available at stores; vegetables such as tomatoes were processed and imported from abroad, with emphasis on cheapness; it was hard to time the food to achieve “just enough” cooking; butter was salted.

So, on the one hand, I am very grateful to Julia Child for providing recipes and meal plans that were far, far better than what came before (with the possible exception of butterscotch pudding). But the sad fact is that people in the US still didn’t understand that food could be much better than that. That is, they valued foreign, but they didn’t value good foreign (and not necessarily French; I have tasted the same philosophy applied to Caribbean and Chinese, with the same superb results). Only in the last 10-15 years, as I say, have I seen a significant number of restaurants that consistently make me want to slow down and savor the taste. Only in that period have Americans been able to appreciate how nice life is when you can occasionally have an experience like that.

And there’s so much we still don’t know about. A really good gateau Bretonne. Mayonnaise sauce that tastes like hollandaise sauce, and hollandaise sauce with artichokes. Evian fruite, a far different kind of soft drink. Real French bread. Mushroom or artichoke puree. Sole meuniere as it should be. Kir that tastes as good as the finest wine. A superb brioche with French butter and jam. Julia covers mostly Paris. The regions add much more.

So I remember Julia Child with great fondness, and salute her integrity that insisted on fidelity to French quality rather than American short-cuts. But I think that the primary credit for improving our quality of food and life belongs to later chefs, who finally brought the French philosophy to the restaurant as well as the home.

Wednesday, August 19, 2009

Is SQL Toast? Or Is Java Being Stupid Again?

A recent Techtarget posting by the SearchSOA editor picks up on the musings of Miko Matsumura of Software AG, suggesting that because most new apps in the cloud can use data in main memory, there’s no need for the enterprise-database SQL API; rather, developers should access their data via Java. OK, that’s a short summary of a more nuanced argument. But the conclusion is pretty blunt: “SQL is toast.”

I have no great love for relational databases – as I’ve argued for many years, “relational” technology is actually marketing hype about data management that mostly is not relational at all. That is, the data isn’t stored as relational theory would suggest. The one truly relational thing about relational technology is SQL: the ability to perform operations on data in an elegant, high-level, somewhat English-like mini-language.

What’s this Java alternative that Miko’s talking about? Well, Java is an object-oriented programming (OOP) language. By “object”, OOP means a collection of code and the data on which it operates. Thus, an object-oriented database is effectively chunks of data, each stored with the code to access it.

So this is not really about Larry Ellison/Oracle deciding the future, or the “network or developer [rather] than the underlying technology”, as Miko puts it. It’s a fundamental question: which is better, treating data as a database to be accessed by objects, or as data within objects?

Over the last fifteen years, we have seen the pluses and minuses of “data in the object”. One plus is that there is no object-relational mismatch, in which you have to fire off a SQL statement to some remote, un-Java-like database like Oracle or DB2 whenever you need to get something done. The object-relational mismatch has been estimated to add 50% to development times, mostly because developers who know Java rarely know SQL.

Then there are the minuses, the reasons why people find themselves retrofitting SQL invocations to existing Java code. First of all, object-oriented programs in most cases don’t perform well in data-related transactions. Data stored separately in each object instance uses a lot of extra space, and the operations on it are not optimized. Second, in many cases, operations and the data are not standardized across object classes or applications, wasting lots of developer time. Third, OOP languages such as Java are low-level, and specifically low-level with regard to data manipulation. As a result, programming transactions on vanilla Java takes much longer than programming on one of the older 4GLs (like, say, the language that Blue Phoenix uses for some of its code migration).

So what effect would storing all your data in main memory have on Java data-access operations? Well, the performance hit would still be there – but would be less obvious, because of the overall improvement in access speed. In other words, it might take twice as long as SQL access, but since we might typically be talking about 1000 bytes to operate on, we still see 2 microseconds instead of 1, which is a small part of response time over a network. Of course, for massive queries involving terabytes, the performance hit will still be quite noticeable.

What will not go away immediately is the ongoing waste of development time. It’s not an obvious waste of time, because the developer either doesn’t know about 4GL alternatives or is comparing Java-data programming to all the time it takes to figure out relational operations and SQL. But it’s one of the main reasons reason that adopting Java actually caused a decrease in programmer productivity compared to structured programming, according to some user feedback I once collected, 15 years ago.

More fundamentally, I have to ask if the future of programming is going to be purely object-oriented or data-oriented. The rapid increase in networking speed of the Internet doesn’t make data processing speed ignorable; on the contrary, it makes it all the more important as a bottleneck. And putting all the data in main memory doesn’t solve the problem; it just makes the problem kick in at larger amounts of data – i.e., for more important applications. And then there’s all this sensor data beginning to flow across the Web …

So maybe SQL is toast. If what replaces it is something that Java can invoke that is high-level, optimizes transactions and data storage, and allows easy access to existing databases – in other words, something data-oriented, something like SQL – then I’m happy. If it’s something like storing data as objects and providing minimal, low-level APIs to manipulate that data – then we will be back to the same stupid over-application of Java that croaked development time and scalability 15 years ago.

Saturday, August 15, 2009

Eventual Consistency and Scale-Out Data Management

A recent blog post by Gordon Haff of Illuminata about new challenges to enterprise relational databases cited a white paper on how Amazon, in particular, is experimenting with loosening the typical relational requirements for ACID (atomicity, consistency, integrity, and durability). In the white paper, the author (Werner Vogels, CTO, Amazon) explains how recent database theory has begun to investigate delayed or “eventual” consistency (EC), and to apply its findings to the real world. Skimming the white paper, I realized that these findings did not seem to be applicable to all real-world situations – it appears that they are only useful in a particular type of scale-out architecture.

The problem that so-called eventual consistency techniques aim to solve is best explained by reviewing history. As distributed computing arrived in the 1980s, database theory attempted to figure out what to do when the data in a data store was itself distributed. Partitioning was an obvious solution: put item a on system A, and item b on system B, and then put processes 1 and 2 on system A (or, as in the case of Microsoft SQL Server in the 1990s, processes 1 and 2 on both A and B, so transaction streams can be multiplexed). As a result, the database can step in when either process references item b, and handle the reads and writes as if item b is really on system A. However, this is not a great general-case solution (although Sybase, for one, offers database partitioning for particular cases): optimum partitions for performance tend to change over time, and in many cases putting the same datum on 2 systems yields better parallelism and hence better performance.

The next solution – the main solution during the 1990s – was two-phase commit. Here, the idea was to make absolutely sure that processes 1 and 2 did not see different values in item a (or b) at any time. So, the “commit” first sent out instructions to lock data with multiple copies on multiple systems, then received indications from those systems that they were ready to update (Phase 1), then told them to update, and only when everyone had answered that the item had been updated (Phase 2) was the data made available for use again. This ensured consistency and the rest of the ACID properties; but when more than a few copies were involved, the performance overhead on queries was high. In theoretical terms, they had sacrificed “availability” of the multiple-copy item during the update for its consistency.

At this point, in the real world, data warehouses carved out a part of data processing in which there was no need for two-phase commit, because one copy of the data (the one in the data warehouse) could always be out of date. Replication simply streamed updates from the operational frequent-update system to the decision-support no-online-updates-allowed system in nightly bursts when the data warehouse was taken offline.

In the early 2000s, according to the white paper, theory took a new tack – seeing if some consistency could be sacrificed for availability. To put it another way, researchers noted that in some cases, when a multiple-copy update arrives, “a) it is OK to make the item available if some but not all item copies on each system have been updated (“eventual read/write”) or (b) it is OK if you use a previous data-item version until all copy updates have been completed. In case (a) you save most of Phase 2 of a corresponding two-phase commit, and in case (b), you save all of Phase 2 and most of Phase 1 as well. EC is therefore the collection of techniques to allow availability before consistency is re-established.

Where EC Fits

So, in what kinds of situations does EC help? First of all, these are situations where users need multiple data copies on multiple systems for lots of items, in order to scale . If data is on a single system, or partitioned on multiple systems, you could probably use optimistic locking or versioning to release write locks on updates (and thereby make the item available again) just as quickly. Likewise, two-phase commit involves little performance overhead in distributed systems where few multiple-copy items and few updates on these items are involved – so EC isn’t needed there, either.

A second limitation on the use of EC seems to be the rate of updates to a particular multiple-copy data item. Too frequent updates, and a state of perpetually delayed consistency would seem to result – in effect, no consistency at all.

Thus, EC does not appear appropriate for pure distributed OLTP (online transaction processing). It also does not fit pure decision support/data warehousing, where updates occur in mammoth bursts. It may be appropriate for EII or “data virtualization”-type cross-database updates mixed with querying, although I believe that real-world implementations do not involve large numbers of multiple-copy items (and hence two-phase commit will do). MDM (master data management) does not appear to be well suited to EC, as implementations typically involve updates funneled through one or two central sites, then replication of the updated item value to all other copies.

Well, then, where does EC fit? The answer seems to be, in scale-out multiple-copy distributed data architectures involving infrequent, predictably-timed updates to each item. For example, a large PC-server farm providing E-commerce to consumers may emphasize prompt response to a rapidly changing workload of customer orders, each customer record update being typically delayable until the time the customer takes to respond to a prompt for the next step in the process. In these cases, data mining across multiple customers can wait until a customer has finished, or can use the previous version of a particular customer’s data. It is therefore no surprise that Amazon would find EC useful.

Conclusions

If we could really implement EC in all cases, it would be a major boost to database performance, as well as to the pure scale-out architectures that otherwise seem to make less and less sense in this era when costs, energy/carbon wastage, and administrative complexity make such architectures less and less desirable. Sadly, I have to conclude, at least for now, that most traditional database use cases typically do not fit the EC model.

However, that is no reason that much more cannot be done in applying EC to “mixed” Web-associated transaction streams. These are, after all, a significant and increasing proportion of all transactional workloads. In these, EC could finally simulate true parallel data processing, rather than the concurrency which can slow really large-scale transaction-handling by orders of magnitude. As an old analysis-of-algorithms guy, I know that time parallelism can translate to exponential performance improvements as the amount of data processed approaches infinity; and if coordination between item copies is minimal, time parallelism is approximated. So EC may very well not be limited in its usefulness to large-scale E-commerce use cases, but may apply to many other use cases within large Web-dependent server farms – and cloud computing is an obvious example. I conclude that EC may not be appropriate for a wide range of today’s transactional needs; but cloud computing implementers, and major database vendors looking to support cloud computing, should “kick the tires” and consider implementing EC capabilities.

Monday, July 13, 2009

Cloud Computing and Data Locality: Not So Fast

Cloud Computing and Data Locality: Not So Fast

In one of my favorite sleazy fantasy novels (The Belgariad, David Eddings) one of the characters is attempting to explain to another why reviving the dead is not a good idea. “You have the ability to simplify, to capture a simple visual picture of something complex [and change it],” the character says. “But don’t over-simplify. Dead is dead.”

In a recent white paper on cloud computing, in an oh-by-the-way manner, Sun mentions the idea of data locality. If I understand it correctly, virtual “environments” in a cloud may have to physically move not only from server to server, but from site to site and/or from private data center to public cloud server farm=2 0and back. More exactly, the applications don’t have to move (just their “state”), and the virtual machine software and hardware doesn’t have to move (it can be replicated or emulated in the target machine; but the data may have to be moved or copied in toto (or continue to access the same physical data store, remotely – which would violate the idea of cloud boundaries, among other problems [like security and performance]). To avoid this, it is apparently primarily up to the developer to keep in mind data locality, which seems to mean avoiding moving the data where possible by keeping it on the same physical server-farm site.

Data locality will certainly be a quick fix for immediate problems of how to create the illusion of a “virtual data center.” But is it a long-term fix? I think not. The reason, I assert, is that cloud computing is an over-simplification – physically distributed data is not virtually unified data -- and our efforts to patch it to approximate the “ideal cloud” will result in unnecessary complexity, cost, and legacy systems.

Consider the most obvious trend in the computing world in the last few years: the inexorable growth in storage of 40-60% per year, continuing despite the recession. The increase in storage reflects, at least partly, an increase in data-store size per application, or, if you wish, per “data center”. It is an increase that appears faster than Moore’s Law, and faster than the rate of increase in communications bandwidth. If moving a business-critical application’s worth of data right now from secondary to primary site for disaster-recovery purposes takes up to an hour, it is likely that moving it two years from now will take 1 ½-2 hours, and so on. Unless this trend is reversed, the idea of a data center that can move or re-partition in minutes between public and private cloud (or even between Boston and San Francisco in a private cloud) is simply unrealistic.

Of course, since the unrealistic doesn’t happen, what will probably happen is that developers will create kludges, one for each application that is20“cloud-ized”, to ensure that data is “pre-copied” and periodically “re-synchronized”, or that barriers are put in the way of data movement from site to site within the theoretically virtual public cloud. That’s the real danger – lots of “reinventing the wheel” with attendant long-term unnecessary costs of administering (and developing new code on top of) non-standardized data movements and the code propelling it, database-architecture complexity, and unexpected barriers to data movement inside the public cloud.

What ought to provide a longer-term solution, I would think, is (a) a way of slicing the data so that only the stuff needed to “keep it running” is moved – which sounds like Information Lifecycle Management (ILM), since one way of doing this is to move the most recent data, the data most likely to be accessed and updated – and (b) a standardized abstraction-layer interface to the data that enforces this. In this way, we will at least have staved off data-locality problems for a few more years, and we don’t embed kludge-type solutions in the cloud infrastructure forever.

However, I fear that such a solution will not arrive before we have created another complicated administrative nightmare. On the one hand, if data locality rules, haven’t we just created a more complicated version of SaaS (the application can’t move because the data can’t?) On the other hand, if our kludges succeed in preserving the illusion of the dynamic application/service/data-center by achieving some minimal remote data movement, how do we scale cloud server-farm sites steadily growing in data-store size by load-balancing hundreds of undocumented hard-coded differing pieces of software accessing data caches that are pretending to be exabytes of physically-local data and are actually accessing remote data during a cache miss?

A quick search of Google finds no one raising this particular point. Instead, the concerns relating to data locality seem to be about vendor lock-in, compliance with data security and privacy regulations, and the difficulty of moving the data for the first time. Another commentator notes the absence of standardized interfaces for cloud computing.

But I say, dead is dead, not alive by another name. If data is always local, that’s SaaS, not cloud by another name. And when you patch to cover up over-simplification, you create unnecessary complexity. Remember when simple-PC server farms were supposed to be an unalloyed joy, before the days of energy concerns and recession-fueled squeezes to high distributed-environment administrative IT costs? Or when avoidance of vendor lock-in was worth the added architectural complexity, before consolidation showed that it wasn’t? I wonder, when this is all over, will IT echo Oliver Hardy, and say to vendors, “Well, Stanley, here’s another fine mess you’ve gotten me into”?