Thoughts From a Retired Software IT Analyst: August 2009

Thursday, August 27, 2009

On Julia Child

While due attention is being paid to Ted Kennedy this week, a fair amount of discussion of the movie Julie and Julia is also taking place, primarily centered around the profound impact that Julia Child has had on many people. While it is great to see Julia getting her fair share of praise, I do disagree with many commentators about Julia’s significance. In particular, I think that she was part of a broader happy trend toward higher-quality cooking in the US, and that her superb recipes or TV shows are less important to that trend than the fact that, at long last, chefs seem to have come on their own to the conclusion that the French philosophy of cooking really does work.

Here’s my story: in 1966, when I was 16, I spent a summer in France – in Paris, in Brittany, on the Loire , and in Normandy. On a previous trip, when I was 9, my main object was to get a hamburger (steak tartare, anyone?). On this trip, my parents let me go out to restaurants frequently, and typically one-, two-, and three-star ones (the Michelin Guide’s ratings of quality, for those who don’t know).

The trip was an eye-opener. The food was always clearly different, and consistently to be savored. I learned that mushrooms were not a gritty, flavorless component of Campbell’s mushroom soup; that sauces were not a variant of ketchup to drench poor-tasting food in; that, when prepared well, fish tasted better than steak; that bread was not the most important part of the meal; and that less liquid rather than more (wine instead of Coke) enhanced taste.

Above all, I learned that during a really good meal, unconsciously, instead of eating fast to satisfy hunger, I ate slowly to enjoy food’s taste. I should note that when we took my son to a three-star French restaurant at the same age, much the same thing happened, and he stopped saying “who cares about food” and started saying “I like this place, I don’t like this one.”

The reasons why French cooking was so far superior to American, in those days, I believe were these:

1. The French insisted the raw materials for food had to be absolutely fresh. Seafood in Paris was typically a half-day from the sea or less.

2. There were specific standards for ingredients. There was a certain kind of starter for French bread, certain types required for vegetables, and, say, margarine was not substituted for butter because it was cheaper.

3. The emphasis was on just enough cooking, rather than overcooking. This was especially true for fish.

4. Meals were planned so that each piece tasted good on its own and also in concert with something else. This made for a certain simplicity: instead of stews and sandwiches, you had delicately flavored fish paired with excellent sauces, plus “al dente” vegetables and sugary but light desserts.

Now, remember, I had been to what were supposed to be excellent French (and other) restaurants in New York City. These French restaurants, even out in the boonies, were in every case far better.

Over the next few years, as my mother watched Julia on TV and tried some of her recipes, and I used her book to cook for special occasions, I consistently found that neither home cooking nor French (or other) restaurants in the US measured up to the French ones. There were a few dishes that were exceptions: I remember a green-bean-with-garlic dish in NYC in 1980 that was worthy of a one-star restaurant in Paris. But until about the last 10 years, despite the enormous shift in American taste 1960-1990 from “HoJo American” to “foreign experimental”, American restaurants of all stripes simply never gave me that feeling of wanting to eat slowly.

I may be reading too much into the situation, but I think the turning point came when chefs finally began to adopt the French philosophies cited above. In other words, they started trying to standardize and improve the quality of ingredients; they gave due attention to making each piece of the meal flavorful, and to sauces; and they started emphasizing freshness.

Why couldn’t our attempts to do Julia match the French? I believe, because the ingredients simply weren’t good enough. Comparable cuts of beef weren’t available at stores; vegetables such as tomatoes were processed and imported from abroad, with emphasis on cheapness; it was hard to time the food to achieve “just enough” cooking; butter was salted.

So, on the one hand, I am very grateful to Julia Child for providing recipes and meal plans that were far, far better than what came before (with the possible exception of butterscotch pudding). But the sad fact is that people in the US still didn’t understand that food could be much better than that. That is, they valued foreign, but they didn’t value good foreign (and not necessarily French; I have tasted the same philosophy applied to Caribbean and Chinese, with the same superb results). Only in the last 10-15 years, as I say, have I seen a significant number of restaurants that consistently make me want to slow down and savor the taste. Only in that period have Americans been able to appreciate how nice life is when you can occasionally have an experience like that.

And there’s so much we still don’t know about. A really good gateau Bretonne. Mayonnaise sauce that tastes like hollandaise sauce, and hollandaise sauce with artichokes. Evian fruite, a far different kind of soft drink. Real French bread. Mushroom or artichoke puree. Sole meuniere as it should be. Kir that tastes as good as the finest wine. A superb brioche with French butter and jam. Julia covers mostly Paris. The regions add much more.

So I remember Julia Child with great fondness, and salute her integrity that insisted on fidelity to French quality rather than American short-cuts. But I think that the primary credit for improving our quality of food and life belongs to later chefs, who finally brought the French philosophy to the restaurant as well as the home.

Wednesday, August 19, 2009

Is SQL Toast? Or Is Java Being Stupid Again?

A recent Techtarget posting by the SearchSOA editor picks up on the musings of Miko Matsumura of Software AG, suggesting that because most new apps in the cloud can use data in main memory, there’s no need for the enterprise-database SQL API; rather, developers should access their data via Java. OK, that’s a short summary of a more nuanced argument. But the conclusion is pretty blunt: “SQL is toast.”

I have no great love for relational databases – as I’ve argued for many years, “relational” technology is actually marketing hype about data management that mostly is not relational at all. That is, the data isn’t stored as relational theory would suggest. The one truly relational thing about relational technology is SQL: the ability to perform operations on data in an elegant, high-level, somewhat English-like mini-language.

What’s this Java alternative that Miko’s talking about? Well, Java is an object-oriented programming (OOP) language. By “object”, OOP means a collection of code and the data on which it operates. Thus, an object-oriented database is effectively chunks of data, each stored with the code to access it.

So this is not really about Larry Ellison/Oracle deciding the future, or the “network or developer [rather] than the underlying technology”, as Miko puts it. It’s a fundamental question: which is better, treating data as a database to be accessed by objects, or as data within objects?

Over the last fifteen years, we have seen the pluses and minuses of “data in the object”. One plus is that there is no object-relational mismatch, in which you have to fire off a SQL statement to some remote, un-Java-like database like Oracle or DB2 whenever you need to get something done. The object-relational mismatch has been estimated to add 50% to development times, mostly because developers who know Java rarely know SQL.

Then there are the minuses, the reasons why people find themselves retrofitting SQL invocations to existing Java code. First of all, object-oriented programs in most cases don’t perform well in data-related transactions. Data stored separately in each object instance uses a lot of extra space, and the operations on it are not optimized. Second, in many cases, operations and the data are not standardized across object classes or applications, wasting lots of developer time. Third, OOP languages such as Java are low-level, and specifically low-level with regard to data manipulation. As a result, programming transactions on vanilla Java takes much longer than programming on one of the older 4GLs (like, say, the language that Blue Phoenix uses for some of its code migration).

So what effect would storing all your data in main memory have on Java data-access operations? Well, the performance hit would still be there – but would be less obvious, because of the overall improvement in access speed. In other words, it might take twice as long as SQL access, but since we might typically be talking about 1000 bytes to operate on, we still see 2 microseconds instead of 1, which is a small part of response time over a network. Of course, for massive queries involving terabytes, the performance hit will still be quite noticeable.

What will not go away immediately is the ongoing waste of development time. It’s not an obvious waste of time, because the developer either doesn’t know about 4GL alternatives or is comparing Java-data programming to all the time it takes to figure out relational operations and SQL. But it’s one of the main reasons reason that adopting Java actually caused a decrease in programmer productivity compared to structured programming, according to some user feedback I once collected, 15 years ago.

More fundamentally, I have to ask if the future of programming is going to be purely object-oriented or data-oriented. The rapid increase in networking speed of the Internet doesn’t make data processing speed ignorable; on the contrary, it makes it all the more important as a bottleneck. And putting all the data in main memory doesn’t solve the problem; it just makes the problem kick in at larger amounts of data – i.e., for more important applications. And then there’s all this sensor data beginning to flow across the Web …

So maybe SQL is toast. If what replaces it is something that Java can invoke that is high-level, optimizes transactions and data storage, and allows easy access to existing databases – in other words, something data-oriented, something like SQL – then I’m happy. If it’s something like storing data as objects and providing minimal, low-level APIs to manipulate that data – then we will be back to the same stupid over-application of Java that croaked development time and scalability 15 years ago.

Saturday, August 15, 2009

Eventual Consistency and Scale-Out Data Management

A recent blog post by Gordon Haff of Illuminata about new challenges to enterprise relational databases cited a white paper on how Amazon, in particular, is experimenting with loosening the typical relational requirements for ACID (atomicity, consistency, integrity, and durability). In the white paper, the author (Werner Vogels, CTO, Amazon) explains how recent database theory has begun to investigate delayed or “eventual” consistency (EC), and to apply its findings to the real world. Skimming the white paper, I realized that these findings did not seem to be applicable to all real-world situations – it appears that they are only useful in a particular type of scale-out architecture.

The problem that so-called eventual consistency techniques aim to solve is best explained by reviewing history. As distributed computing arrived in the 1980s, database theory attempted to figure out what to do when the data in a data store was itself distributed. Partitioning was an obvious solution: put item a on system A, and item b on system B, and then put processes 1 and 2 on system A (or, as in the case of Microsoft SQL Server in the 1990s, processes 1 and 2 on both A and B, so transaction streams can be multiplexed). As a result, the database can step in when either process references item b, and handle the reads and writes as if item b is really on system A. However, this is not a great general-case solution (although Sybase, for one, offers database partitioning for particular cases): optimum partitions for performance tend to change over time, and in many cases putting the same datum on 2 systems yields better parallelism and hence better performance.

The next solution – the main solution during the 1990s – was two-phase commit. Here, the idea was to make absolutely sure that processes 1 and 2 did not see different values in item a (or b) at any time. So, the “commit” first sent out instructions to lock data with multiple copies on multiple systems, then received indications from those systems that they were ready to update (Phase 1), then told them to update, and only when everyone had answered that the item had been updated (Phase 2) was the data made available for use again. This ensured consistency and the rest of the ACID properties; but when more than a few copies were involved, the performance overhead on queries was high. In theoretical terms, they had sacrificed “availability” of the multiple-copy item during the update for its consistency.

At this point, in the real world, data warehouses carved out a part of data processing in which there was no need for two-phase commit, because one copy of the data (the one in the data warehouse) could always be out of date. Replication simply streamed updates from the operational frequent-update system to the decision-support no-online-updates-allowed system in nightly bursts when the data warehouse was taken offline.

In the early 2000s, according to the white paper, theory took a new tack – seeing if some consistency could be sacrificed for availability. To put it another way, researchers noted that in some cases, when a multiple-copy update arrives, “a) it is OK to make the item available if some but not all item copies on each system have been updated (“eventual read/write”) or (b) it is OK if you use a previous data-item version until all copy updates have been completed. In case (a) you save most of Phase 2 of a corresponding two-phase commit, and in case (b), you save all of Phase 2 and most of Phase 1 as well. EC is therefore the collection of techniques to allow availability before consistency is re-established.

Where EC Fits

So, in what kinds of situations does EC help? First of all, these are situations where users need multiple data copies on multiple systems for lots of items, in order to scale . If data is on a single system, or partitioned on multiple systems, you could probably use optimistic locking or versioning to release write locks on updates (and thereby make the item available again) just as quickly. Likewise, two-phase commit involves little performance overhead in distributed systems where few multiple-copy items and few updates on these items are involved – so EC isn’t needed there, either.

A second limitation on the use of EC seems to be the rate of updates to a particular multiple-copy data item. Too frequent updates, and a state of perpetually delayed consistency would seem to result – in effect, no consistency at all.

Thus, EC does not appear appropriate for pure distributed OLTP (online transaction processing). It also does not fit pure decision support/data warehousing, where updates occur in mammoth bursts. It may be appropriate for EII or “data virtualization”-type cross-database updates mixed with querying, although I believe that real-world implementations do not involve large numbers of multiple-copy items (and hence two-phase commit will do). MDM (master data management) does not appear to be well suited to EC, as implementations typically involve updates funneled through one or two central sites, then replication of the updated item value to all other copies.

Well, then, where does EC fit? The answer seems to be, in scale-out multiple-copy distributed data architectures involving infrequent, predictably-timed updates to each item. For example, a large PC-server farm providing E-commerce to consumers may emphasize prompt response to a rapidly changing workload of customer orders, each customer record update being typically delayable until the time the customer takes to respond to a prompt for the next step in the process. In these cases, data mining across multiple customers can wait until a customer has finished, or can use the previous version of a particular customer’s data. It is therefore no surprise that Amazon would find EC useful.

Conclusions

If we could really implement EC in all cases, it would be a major boost to database performance, as well as to the pure scale-out architectures that otherwise seem to make less and less sense in this era when costs, energy/carbon wastage, and administrative complexity make such architectures less and less desirable. Sadly, I have to conclude, at least for now, that most traditional database use cases typically do not fit the EC model.

However, that is no reason that much more cannot be done in applying EC to “mixed” Web-associated transaction streams. These are, after all, a significant and increasing proportion of all transactional workloads. In these, EC could finally simulate true parallel data processing, rather than the concurrency which can slow really large-scale transaction-handling by orders of magnitude. As an old analysis-of-algorithms guy, I know that time parallelism can translate to exponential performance improvements as the amount of data processed approaches infinity; and if coordination between item copies is minimal, time parallelism is approximated. So EC may very well not be limited in its usefulness to large-scale E-commerce use cases, but may apply to many other use cases within large Web-dependent server farms – and cloud computing is an obvious example. I conclude that EC may not be appropriate for a wide range of today’s transactional needs; but cloud computing implementers, and major database vendors looking to support cloud computing, should “kick the tires” and consider implementing EC capabilities.

Thoughts From a Retired Software IT Analyst