Monday, December 14, 2009

Nagging Questions About Investment Theory

At the recent IBM STG (Systems and Technologies) Group analyst briefing, I heard Helene Armitage assert that a primary cause of the failure of financial-services models to detect the problems with the mortgage-backed securities that were the proximate cause of our economic-system seize-up was “firms’ failure to examine the models’ assumptions adequately.” I think this is flat wrong.

I think that the root cause of analysis failure was the mind-set behind the models themselves. I believe that analysts and firms failed to challenge the models because of a mindset that said – and continues to say – “a security is a security is a security.” More specifically, I think that firms look at investments with the notion that all investments are fundamentally alike, and can be represented in terms of return and risk, strictly defined. But as, over the last thirty years, this notion has been steadily extended from stocks to bonds, commodities, private equity, derivatives, and mortgages, no one has exhaustively examined the ways in which, beneath the surface of alpha and beta, risk and return for these investment types differ from those of stocks.

What follows is a couple of things that have bothered me about this mindset as I look at capital market pricing theory (CAPM) and the like. This is not an argument about whether CAPM is accurate – my impression is that, accurate or not, it shares these “bothersome” assumptions with other alternative proposed theories. It simply provides a well-defined way to examine these assumptions.

The first thing that bothers me is the notion of diversification. Let’s suppose I have two equally weighted stocks with exact -1 correlation (or an index and costless shorting of that index). If I understand the theory, this is the ultimate in diversification: no matter what, I earn zero. There is no unsystematic risk; in fact (conceptually), there is no systematic risk. I never earn money; the only way I lose money is if the United States collapses.

Now suppose I invest the same amount of money in the “risk-free investment” (U.S. bond, let’s say). This contains the systematic risk I’ve eliminated; also, I always (conceptually) earn at least as much money as in index-and-short, and (practically) always earn more money than index-and-short. In other words, despite the fact that I am more fully diversified, there is no justification for me ever to do index-and-short. I have not reduced the “real” risk of losing money; in fact, after inflation, I have increased it. I have not increased my return, compared to the risk-free asset; I have reduced it. So diversification, for certain types of investments, appears to be worse than useless.

Now apply this same notion to commodities – that is, not ownership of companies producing commodities, but ownership of the commodities themselves. Economics suggests that over the long run, due to technology applied to the commodity or its substitutes, prices fall. Either the commodity is seen as a store of value (gold as money) or not. If it is a store of value, over the long run it will track inflation (like the risk-free investment). If it is not, over the long run it will do worse than inflation. So, by this logic, commodities behave much like index-and-short: in the long run, they always do as badly as, or worse than, the risk-free investment.

So why, in a recent study of the best investments to preserve an annuity after retirement, is there a recommendation to invest in commodities for “diversification”? Well, it turns out that there was a particular point at which commodities performed better than the risk-free asset or anything else, conceptually allowing 5% yearly withdrawals rather than 4%. Translation: 1% more if I assume that I have to keep my “principal” at 100% or more of its initial value. But if I know that on average I will earn 1% more from the risk-free asset than from commodities, then it really doesn’t matter if my principal (when I substitute the risk-free asset for commodities) goes down in one of those years, and stays down; I can still withdraw the same dollar amount as if commodities were in the mix – in effect, I will get a 1% additional return on lower principal, and I can start withdrawing a greater percentage of the entire investment. I realize that this is very crude math, but you get the point.

So, to summarize, my first problem with “a security is a security” is that for certain types of investments, both risk and return are almost always negative compared to the risk-free investment, and therefore diversification does not seem to work.

My second problem is with the notion that risk – defined as variance – is effectively stable over time, for a particular asset type. Take the example of mortgages. It is very easy to think that mortgage failure rates are a creature of the business cycle alone, just like company or bond default rates. In that case, mortgage-backed securities will behave just like other securities, with a large body of historical data allowing a reasonable approximation of failure rates, and well-established procedures for handling them. Because actuaries have long experience with establishing the “risks” of individual mortgages, it seems a natural assumption that the actuarial “risk of failure” of an individual mortgage translates straightforwardly into the variance-type “security is a security” risk of a mortgage-backed security.

I would suggest that a mortgage differs from the usual investment in two key ways: it is more “sticky”, and there is a power imbalance between borrower and lender. It is more “sticky” because, even when you have adjustable rates, changing the terms of the contract is a matter of changing millions of individual policies, so that the outcome is more typically complete failure rather than part payment. There is a greater imbalance in favor of the lender than in the typical investment, because the bank or other lender has economies of scale compared to the individual borrower. The result of both differences, I would argue, is that there are incentives in the system to increase the likelihood of failure over time, across multiple business cycles, and therefore past data tends to underestimate the rates and effects of failures and the increased failure-type risk of the next business-cycle downturn or excess investment.

I think of it this way: suppose, to finance a house, an individual sold a $250,000, 5% coupon, sinking-fund bond. The security would be the bond itself. If interest rates/inflation changed, the price of the bond would change. If credit risk increased, the price of the bond would decrease. The investor who bought the bond would assume this risk, and the bond issuer – the borrower – would have a market to handle increases in credit risk normal to the business cycle. Does anyone really think that this describes the way mortgages work? And if not, why would you think that devices appropriate for bonds, like stripping, matching, and hedging, would be appropriate for mortgages? But because people did, risks were underestimated in formulating derivatives, demand for securities drove demand for mortgages, and unnecessary failures with little yield to the investor drove banks and hedge funds to their knees, with horrendous spill-over effects.

To summarize: sorry, Helene, I think that the real cause of financial-firm model failure is not a failure to examine assumptions, but a failure in the mindset that produced the models: the assumption that “a security is a security.” I question the mindset that thinks diversification handles risk for every investment type, and I question the mindset that thinks that variance-type risk is stable across business cycles for every investment type. I am not the right person to suggest a better mindset. But I sure hope someone does, instead of accepting the half-solutions that seem to me to be implicit not just in Helene’s critique, but those of most commentators.

Parenthetical note: it seems to me that we are seeing two types of “extreme” risks in today’s world, and only two. There is the risk of country collapse, in which case it doesn’t matter if you invested your money in your mattress or in gold or in a risk-free investment or whatever; it’s all worthless anyway. Then there’s the bank-seize-up/Great Depression type of risk, which we have seen at least twice in the last 100 years. While the jury is still out, it appears that this doesn’t distort the relative risk/return of stocks, bonds, and the risk-free asset over periods of 15-20 years. So why, as investors and investment theorists, are we living in fear that it will?

Tuesday, November 3, 2009

Dave Hill, Data Protection, and the Future of IT

Full disclosure: Dave is a friend, and a long-time colleague. He has just written an excellent book on Data Protection; hence the following musings.

As I was reading (a rapid first scan), I tried to pin down why I liked the book so much. It certainly wasn’t the editing, since I helped with that. The topic is reasonably well covered, albeit piecemeal, by vendors, industry associations, and bloggers. And while I have always enjoyed Dave’s stories and jokes, the topic does not lend itself to elaborate stylistic flourishes.

After thinking about it some more, I came to the conclusion that it’s Dave’s methodology that I value. Imho, Dave in each chapter will lay out a comprehensive and innovative classification of the topic at hand – data governance, information lifecycle management, data security – and then use that classification to bring new insight into a well-covered topic. The reason I like this approach is that it allows you to use the classification as a springboard, to come to your own conclusions, to extend the classification and apply it in other areas. In short, I found myself continually translating classifications from the narrow world of storage to the broad world of “information”, and being enlightened thereby.

One area in particular that called forth this type of analysis was the topic of cloud computing and storage. If data protection, more or less, involves considerations of compliance, operational/disaster recovery, and security, how do these translate to a cloud external to the enterprise? And what is the role of IT in data protection when both physical and logical information are now outside of IT’s direct control?

But this is merely a small part of the overall question of the future of IT, if external clouds take over large chunks of enterprise software/hardware. If the cloud can do it all cheaper, because of economies of scale, what justification is there for IT to exist any longer? Or will IT become “meta-IT”, applying enterprise-specific risk management, data protection, compliance, and security to their own logical part of a remote, multi-tenant physical infrastructure?

I would suggest another way of slicing things. It is reasonable to think of a business, and hence underlying IT, as cost centers, which benefit from commodity solutions provided externally, and competitive-advantage or profit centers, for which being like everything else is actually counter-productive. In an ideal world, where the cloud can always underprice commodity hardware and software, IT’s value-add lies where things are not yet commodities. In other words, in the long run, IT should be the “cache”, the leading edge, the driver of the computing side of competitive advantage.

What does that mean, practically? It means that the weight of IT should shift much more towards software and product development and initial use. IT’s product-related and innovative-process-related software and the systems to test and deploy them are IT’s purview; the rest should be in the cloud. But this does not make IT less important; on the contrary, it makes IT more important, because not only does IT focus on competitive advantage when things are going well, it also focuses on agile solutions that pay off in cost savings by more rapid adaptation when things are going poorly. JIT inventory management is a competitive advantage when orders are rising; but also a cost saver when orders are falling.

I realize that this future is not likely to arrive any time soon. The problem is that in today’s IT, maintenance costs crowd out new-software spending, so that the CEO is convinced that IT is not competent to handle software development. But let’s face it, no one else is, either. Anyone following NPD (new product development) over the last few years realizes that software is an increasing component in an increasing number of industries. Outsourcing competitive-advantage software development is therefore increasingly like outsourcing R&D – it simply doesn’t work unless the key overall direction is in-house. Whether or not IT does infrastructure governance in the long run, it is necessarily the best candidate to do NPD software-development governance.

So I do believe that IT has a future; but quite a different one from its present. As you can see, I have wandered far afield from Data Protection, thanks to Dave Hill’s thought-provoking book. The savvy reader of this tome will, I have no doubt, be able to come up with other, equally fascinating thoughts.

Monday, October 19, 2009

Speed vs. Agility

A recent product announcement by IBM and a series of excellent (or at least interesting) articles in Sloan Management Review have set me to musing on one unexamined assumption in most assessments: that increased process speed equals increased business agility. My initial take: this is true in most cases, but not in all, and can be misleading as a cookie-cutter strategy.

The IBM announcement centered around integration of their business-process management (BPM) capabilities, in order to achieve agility by speeding up business processes. What was notably missing was integration with IBM’s capabilities for New Product Development (NPD) – Rational and the like. However, my initial definition and application of KAIs (key agility indicators) at a business level suggests that speeding up NPD, including development of new business processes, has far more of an impact on long-term business agility than speeding up existing processes. To put it another way, increasing the Titanic’s ability to turn sharply is far more likely to avert disaster than increasing its top speed charging straight ahead – in fact, increasing its speed makes it more likely to crash into an iceberg.

A similar assumption seems to have been made in SMR’s latest issue, in the article entitled “Which Innovation Efforts Will Pay?” The message of this article appears to be that improving innovation efforts is primarily a matter of focusing more on the “healthy innovation” middle region of internally-developed modest “base hits”, with little or no effect from speeding up internal innovation processes or expanding them to include outside innovation. By contrast, the article “Does IP Strategy Have to Cripple Open Innovation?” suggests that collaborative strategies across organizational lines focused on NPD make users far more agile and businesses far better off, despite requiring as much (or more) time to implement as in-house efforts. And finally, we might cite a study in SMR suggesting that users estimating inventory-refill needs were more likely to make sub-optimal decisions when fed data daily than when fed a weekly summary, or the recent book on system dynamics by Donella Meadows that argued that increasing the speed of a process was often accomplished by increasing its rigidity (constraining the process in order to optimize the typical case right now), which made future disasters, as the system inevitably grows, less avoidable and more life-threatening.

All of this suggests that (a) people are assuming that increased process speed automatically translates to increased business agility, and (b) on the contrary, in many cases it translates to insignificant improvements or significant decreases in agility. But how do we tell when speed equals agility, and when not? What rules of thumb are there to tell us when increased speed does not positively impact business agility, and, in those cases, what should we do?

I don’t pretend to have the final answers for these questions. But I do have some initial thoughts on typical situations that lead to increased speed but decreased agility, and on how to assess and improve business investment strategies in those cases.

If it isn’t sustainable it isn’t agile. Let us suppose that we improve a business process by applying technology that speeds the process by decreasing the need for human resources. Further, suppose that the technology involves increased carbon or energy use – a 1/1 replacement of people-hours by computing power, say. Over the long run, this increased energy use will need to be dealt with, adding costs for future business-process redesign and decreasing the money available for future innovation. The obvious rejoinder is that cost savings will fund those future costs; except that today, most organizations are still digging themselves deeper into an energy hole, while operational IT costs, driven by an increased need for storage, continue to exert upward pressure and crowd out new-app development.

As the most recent SMR issue notes, the way to handle this problem is to build sustainability into every business process. If lack of sustainability decreases agility, then the converse is also true: building sustainability into the company, including both ongoing processes and NPD, increases revenues, decreases costs – and increases agility.

If it’s less open it’s less agile. In some ways, this is a tautology: if an organization changes a business process so as to preclude some key inputs in the name of speed, it will be less successful in identifying problems that call for adaptation. However, it does get at one of the subtler causes of organizational rigidity: the need to do something, anything, quickly in order to survive. A new online banking feature may make check processing much more rapid, but if customers are not listened to adequately, it may be rejected in the marketplace, or cost the business market share.

Detecting and correcting this type of problem is hard, because organizational politics points everyone to the lesson only after the process has already been implemented and has gained a toehold – and because businesses may draw the wrong conclusion (e.g., it’s about better design, not about ensuring open collaboration). The best fix is probably a strong, consistent, from-the-top emphasis on collaboration, agile processes, and not shooting the messenger.

Those are the main ways I have seen in which increased speed can actually make things worse. I want to add one more suggestion, which affects not so much situations where speed has negative consequences, but rather cases in which speed and agility can be improved more cost-effectively: upgrading the development process is better. That is, even if you are redesigning an existing business process like disaster recovery for better speed, you get a better long-term bang for your buck by also improving the agility of the process by which you create and implement the speedier solution. Not only does this have an immediate impact in making the solution itself more agile; it also bleeds over into the next project to improve a business process, or a product or service. And the best way I’ve found so far to improve development-process speed, quality, and effectiveness is an agile process.

Sunday, September 6, 2009

Open Source, Windows, Portability, and Borges

There is a wonderful short story by Jorge Luis Borges ("Pierre Menard, Author of the Quixote") that, I believe, captures the open source effort to come to terms with Windows – which in some quarters is viewed as the antithesis of the philosophy of open source. In this short story, a critic analyzes Don Quixote as written by someone four hundred years later – someone who has attempted to live his life so as to be able to write the exact same words as in the original Don Quixote. The critic’s point is that even though the author is using the same words, today they mean something completely different.

In much the same way, open source has attempted to mimic Windows on “Unix-like” environments (various flavors of Unix and Linux) without triggering Microsoft’s protection of its prize operating system. To do this, they have set up efforts such as Wine and ReactOS (to provide the APIs of Windows from Win2K onwards) and Mono (to provide the .NET APIs). These efforts attempt to support the same APIs as Microsoft’s, but with no knowledge of how Microsoft created them. This is not really reverse engineering, as the aim of reverse engineering is usually to figure out how functionality was achieved. These efforts don’t care how the functionality was achieved – they just want to provide the same collection of words (the APIs and functionality).

But while the APIs are the same, the meaning of the effort has changed in the twenty-odd years since people began asking how to make moving programs from Wintel to another platform (and vice versa) as easy as possible. Then, every platform had difficulties with porting, migration, and source or binary compatibility. Now, Wintel and the mainframe, among the primary installed bases, are the platforms that are most difficult to move to or from. Moreover, the Web, or any network, as a distinct platform did not exist; today, the Web is increasingly a place in which every app and most middleware must find a way to run. So imitating Windows is no longer so much about moving Windows applications to cheaper or better platforms; it is about reducing the main remaining barrier to being able to move any app or software from any platform to any other, and into “clouds” that may hide the underlying hardware, but will still suffer when apps are platform-specific.

Now, “moving” apps and “easy” are very vague terms. My own hierarchy of ease of movement from place to place begins with real-time portability. That is, a “virtual machine” on any platform can run the app, without significant effects on app performance, robustness, and usability (i.e., the user interface allows you to do the same things). Real-time portability means the best performance for the app via load balancing and dynamic repartitioning. Java apps are pretty much there today. However, apps in other programming languages are not so lucky, nor are legacy apps.

The next step down from real-time portability is binary compatibility. The app may not work very well when moved in real time from one platform to another, but it will work, without needing changes or recompilation. That’s why forward and backward compatibility matter: they allow the same app to work on earlier or later versions of a platform. As time goes on, binary compatibility gets closer and closer to real-time portability, as platforms adapt to be able to handle similar workloads. Windows Server may not scale as well as the mainframe, but they both can handle the large majority of Unix-like workloads. It is surprising how few platforms have full binary compatibility with all the other platforms; it isn’t just Windows to the mainframe but also compatibility between different versions of Unix and Linux. So we are a ways away from binary compatibility, as well.

The next step down is source-code compatibility. This means that in order to run on another platform, you can use the same source code, but it must be recompiled. In other words, source-code but not binary compatibility seems to rule out real-time movement of apps between platforms. However, it does allow applications to generate a version for each platform, and then interoperate/load balance between those versions; so we can crudely approximate real-time portability in the real world. Now we are talking about a large proportion of apps on Unix-like environments (although not all), but Windows and mainframe apps are typically not source-code compatible with the other two environments. Still, this explains why users can move Linux apps onto the mainframe with relative ease.

There’s yet another step down: partial compatibility. This seems to come in two flavors: higher-level compatibility (that is, source-code compatibility if the app is written to a higher-level middleware interface such as .NET) and “80-20” compatibility (that is, 80% of apps are source-code incompatible in only a few, easily modified places; the other 20% are the nasty problems). Together, these two cases comprise a large proportion of all apps; and it may be comforting to think that legacy apps will sunset themselves so that eventually higher-level compatibility will become de facto source-code compatibility. However, the remaining cases include many important Windows apps and most mission- and business-critical mainframe apps. To most large enterprises, partial compatibility is not an answer. And so we come to the final step down: pure incompatibility, only cured by a massive portation/rewrite effort that has become much easier but is still not feasible for most such legacy apps.

Why does all this matter? Because we are closer to Nirvana than we realize. If we can imitate enough of Windows on Linux, we can move most Windows apps to scale-up servers when needed (Unix/Linux or mainframe). So we will have achieved source-code compatibility from Windows to Linux, Java real-time portability from Linux to Windows, source-code compatibility for most Windows apps from Windows to Linux on the mainframe, and Linux source-code compatibility and Java real-time portability from Linux to the mainframe and back. It would be nice to have portability from z/OS apps to Linux and Windows platforms; but neither large enterprises nor cloud vendors really need this – the mainframe has that strong a TCO/ROI and energy-savings story for large-scale and numerous (say, more than 20 apps) situations.

So, in an irony that Borges might appreciate, open-source efforts may indeed allow lower costs and greater openness for Windows apps; but not because open source free software will crowd out Windows. Rather, a decent approximation of cross-platform portability with lower per-app costs will be achieved because these efforts allow users to leverage Windows apps on other platforms, where the old proprietary vendors could never figure out how to do it. The meaning of the effort may be different than it would have been 15 years ago; but the result will be far more valuable. Or, as Borges’ critic might say, the new meaning speaks far more to people today than the old. Sometimes, Don Quixote tilting at windmills is a useful thing.

Thursday, August 27, 2009

On Julia Child

While due attention is being paid to Ted Kennedy this week, a fair amount of discussion of the movie Julie and Julia is also taking place, primarily centered around the profound impact that Julia Child has had on many people. While it is great to see Julia getting her fair share of praise, I do disagree with many commentators about Julia’s significance. In particular, I think that she was part of a broader happy trend toward higher-quality cooking in the US, and that her superb recipes or TV shows are less important to that trend than the fact that, at long last, chefs seem to have come on their own to the conclusion that the French philosophy of cooking really does work.

Here’s my story: in 1966, when I was 16, I spent a summer in France – in Paris, in Brittany, on the Loire , and in Normandy. On a previous trip, when I was 9, my main object was to get a hamburger (steak tartare, anyone?). On this trip, my parents let me go out to restaurants frequently, and typically one-, two-, and three-star ones (the Michelin Guide’s ratings of quality, for those who don’t know).

The trip was an eye-opener. The food was always clearly different, and consistently to be savored. I learned that mushrooms were not a gritty, flavorless component of Campbell’s mushroom soup; that sauces were not a variant of ketchup to drench poor-tasting food in; that, when prepared well, fish tasted better than steak; that bread was not the most important part of the meal; and that less liquid rather than more (wine instead of Coke) enhanced taste.

Above all, I learned that during a really good meal, unconsciously, instead of eating fast to satisfy hunger, I ate slowly to enjoy food’s taste. I should note that when we took my son to a three-star French restaurant at the same age, much the same thing happened, and he stopped saying “who cares about food” and started saying “I like this place, I don’t like this one.”

The reasons why French cooking was so far superior to American, in those days, I believe were these:

1. The French insisted the raw materials for food had to be absolutely fresh. Seafood in Paris was typically a half-day from the sea or less.

2. There were specific standards for ingredients. There was a certain kind of starter for French bread, certain types required for vegetables, and, say, margarine was not substituted for butter because it was cheaper.

3. The emphasis was on just enough cooking, rather than overcooking. This was especially true for fish.

4. Meals were planned so that each piece tasted good on its own and also in concert with something else. This made for a certain simplicity: instead of stews and sandwiches, you had delicately flavored fish paired with excellent sauces, plus “al dente” vegetables and sugary but light desserts.

Now, remember, I had been to what were supposed to be excellent French (and other) restaurants in New York City. These French restaurants, even out in the boonies, were in every case far better.

Over the next few years, as my mother watched Julia on TV and tried some of her recipes, and I used her book to cook for special occasions, I consistently found that neither home cooking nor French (or other) restaurants in the US measured up to the French ones. There were a few dishes that were exceptions: I remember a green-bean-with-garlic dish in NYC in 1980 that was worthy of a one-star restaurant in Paris. But until about the last 10 years, despite the enormous shift in American taste 1960-1990 from “HoJo American” to “foreign experimental”, American restaurants of all stripes simply never gave me that feeling of wanting to eat slowly.

I may be reading too much into the situation, but I think the turning point came when chefs finally began to adopt the French philosophies cited above. In other words, they started trying to standardize and improve the quality of ingredients; they gave due attention to making each piece of the meal flavorful, and to sauces; and they started emphasizing freshness.

Why couldn’t our attempts to do Julia match the French? I believe, because the ingredients simply weren’t good enough. Comparable cuts of beef weren’t available at stores; vegetables such as tomatoes were processed and imported from abroad, with emphasis on cheapness; it was hard to time the food to achieve “just enough” cooking; butter was salted.

So, on the one hand, I am very grateful to Julia Child for providing recipes and meal plans that were far, far better than what came before (with the possible exception of butterscotch pudding). But the sad fact is that people in the US still didn’t understand that food could be much better than that. That is, they valued foreign, but they didn’t value good foreign (and not necessarily French; I have tasted the same philosophy applied to Caribbean and Chinese, with the same superb results). Only in the last 10-15 years, as I say, have I seen a significant number of restaurants that consistently make me want to slow down and savor the taste. Only in that period have Americans been able to appreciate how nice life is when you can occasionally have an experience like that.

And there’s so much we still don’t know about. A really good gateau Bretonne. Mayonnaise sauce that tastes like hollandaise sauce, and hollandaise sauce with artichokes. Evian fruite, a far different kind of soft drink. Real French bread. Mushroom or artichoke puree. Sole meuniere as it should be. Kir that tastes as good as the finest wine. A superb brioche with French butter and jam. Julia covers mostly Paris. The regions add much more.

So I remember Julia Child with great fondness, and salute her integrity that insisted on fidelity to French quality rather than American short-cuts. But I think that the primary credit for improving our quality of food and life belongs to later chefs, who finally brought the French philosophy to the restaurant as well as the home.

Wednesday, August 19, 2009

Is SQL Toast? Or Is Java Being Stupid Again?

A recent Techtarget posting by the SearchSOA editor picks up on the musings of Miko Matsumura of Software AG, suggesting that because most new apps in the cloud can use data in main memory, there’s no need for the enterprise-database SQL API; rather, developers should access their data via Java. OK, that’s a short summary of a more nuanced argument. But the conclusion is pretty blunt: “SQL is toast.”

I have no great love for relational databases – as I’ve argued for many years, “relational” technology is actually marketing hype about data management that mostly is not relational at all. That is, the data isn’t stored as relational theory would suggest. The one truly relational thing about relational technology is SQL: the ability to perform operations on data in an elegant, high-level, somewhat English-like mini-language.

What’s this Java alternative that Miko’s talking about? Well, Java is an object-oriented programming (OOP) language. By “object”, OOP means a collection of code and the data on which it operates. Thus, an object-oriented database is effectively chunks of data, each stored with the code to access it.

So this is not really about Larry Ellison/Oracle deciding the future, or the “network or developer [rather] than the underlying technology”, as Miko puts it. It’s a fundamental question: which is better, treating data as a database to be accessed by objects, or as data within objects?

Over the last fifteen years, we have seen the pluses and minuses of “data in the object”. One plus is that there is no object-relational mismatch, in which you have to fire off a SQL statement to some remote, un-Java-like database like Oracle or DB2 whenever you need to get something done. The object-relational mismatch has been estimated to add 50% to development times, mostly because developers who know Java rarely know SQL.

Then there are the minuses, the reasons why people find themselves retrofitting SQL invocations to existing Java code. First of all, object-oriented programs in most cases don’t perform well in data-related transactions. Data stored separately in each object instance uses a lot of extra space, and the operations on it are not optimized. Second, in many cases, operations and the data are not standardized across object classes or applications, wasting lots of developer time. Third, OOP languages such as Java are low-level, and specifically low-level with regard to data manipulation. As a result, programming transactions on vanilla Java takes much longer than programming on one of the older 4GLs (like, say, the language that Blue Phoenix uses for some of its code migration).

So what effect would storing all your data in main memory have on Java data-access operations? Well, the performance hit would still be there – but would be less obvious, because of the overall improvement in access speed. In other words, it might take twice as long as SQL access, but since we might typically be talking about 1000 bytes to operate on, we still see 2 microseconds instead of 1, which is a small part of response time over a network. Of course, for massive queries involving terabytes, the performance hit will still be quite noticeable.

What will not go away immediately is the ongoing waste of development time. It’s not an obvious waste of time, because the developer either doesn’t know about 4GL alternatives or is comparing Java-data programming to all the time it takes to figure out relational operations and SQL. But it’s one of the main reasons reason that adopting Java actually caused a decrease in programmer productivity compared to structured programming, according to some user feedback I once collected, 15 years ago.

More fundamentally, I have to ask if the future of programming is going to be purely object-oriented or data-oriented. The rapid increase in networking speed of the Internet doesn’t make data processing speed ignorable; on the contrary, it makes it all the more important as a bottleneck. And putting all the data in main memory doesn’t solve the problem; it just makes the problem kick in at larger amounts of data – i.e., for more important applications. And then there’s all this sensor data beginning to flow across the Web …

So maybe SQL is toast. If what replaces it is something that Java can invoke that is high-level, optimizes transactions and data storage, and allows easy access to existing databases – in other words, something data-oriented, something like SQL – then I’m happy. If it’s something like storing data as objects and providing minimal, low-level APIs to manipulate that data – then we will be back to the same stupid over-application of Java that croaked development time and scalability 15 years ago.

Saturday, August 15, 2009

Eventual Consistency and Scale-Out Data Management

A recent blog post by Gordon Haff of Illuminata about new challenges to enterprise relational databases cited a white paper on how Amazon, in particular, is experimenting with loosening the typical relational requirements for ACID (atomicity, consistency, integrity, and durability). In the white paper, the author (Werner Vogels, CTO, Amazon) explains how recent database theory has begun to investigate delayed or “eventual” consistency (EC), and to apply its findings to the real world. Skimming the white paper, I realized that these findings did not seem to be applicable to all real-world situations – it appears that they are only useful in a particular type of scale-out architecture.

The problem that so-called eventual consistency techniques aim to solve is best explained by reviewing history. As distributed computing arrived in the 1980s, database theory attempted to figure out what to do when the data in a data store was itself distributed. Partitioning was an obvious solution: put item a on system A, and item b on system B, and then put processes 1 and 2 on system A (or, as in the case of Microsoft SQL Server in the 1990s, processes 1 and 2 on both A and B, so transaction streams can be multiplexed). As a result, the database can step in when either process references item b, and handle the reads and writes as if item b is really on system A. However, this is not a great general-case solution (although Sybase, for one, offers database partitioning for particular cases): optimum partitions for performance tend to change over time, and in many cases putting the same datum on 2 systems yields better parallelism and hence better performance.

The next solution – the main solution during the 1990s – was two-phase commit. Here, the idea was to make absolutely sure that processes 1 and 2 did not see different values in item a (or b) at any time. So, the “commit” first sent out instructions to lock data with multiple copies on multiple systems, then received indications from those systems that they were ready to update (Phase 1), then told them to update, and only when everyone had answered that the item had been updated (Phase 2) was the data made available for use again. This ensured consistency and the rest of the ACID properties; but when more than a few copies were involved, the performance overhead on queries was high. In theoretical terms, they had sacrificed “availability” of the multiple-copy item during the update for its consistency.

At this point, in the real world, data warehouses carved out a part of data processing in which there was no need for two-phase commit, because one copy of the data (the one in the data warehouse) could always be out of date. Replication simply streamed updates from the operational frequent-update system to the decision-support no-online-updates-allowed system in nightly bursts when the data warehouse was taken offline.

In the early 2000s, according to the white paper, theory took a new tack – seeing if some consistency could be sacrificed for availability. To put it another way, researchers noted that in some cases, when a multiple-copy update arrives, “a) it is OK to make the item available if some but not all item copies on each system have been updated (“eventual read/write”) or (b) it is OK if you use a previous data-item version until all copy updates have been completed. In case (a) you save most of Phase 2 of a corresponding two-phase commit, and in case (b), you save all of Phase 2 and most of Phase 1 as well. EC is therefore the collection of techniques to allow availability before consistency is re-established.

Where EC Fits

So, in what kinds of situations does EC help? First of all, these are situations where users need multiple data copies on multiple systems for lots of items, in order to scale . If data is on a single system, or partitioned on multiple systems, you could probably use optimistic locking or versioning to release write locks on updates (and thereby make the item available again) just as quickly. Likewise, two-phase commit involves little performance overhead in distributed systems where few multiple-copy items and few updates on these items are involved – so EC isn’t needed there, either.

A second limitation on the use of EC seems to be the rate of updates to a particular multiple-copy data item. Too frequent updates, and a state of perpetually delayed consistency would seem to result – in effect, no consistency at all.

Thus, EC does not appear appropriate for pure distributed OLTP (online transaction processing). It also does not fit pure decision support/data warehousing, where updates occur in mammoth bursts. It may be appropriate for EII or “data virtualization”-type cross-database updates mixed with querying, although I believe that real-world implementations do not involve large numbers of multiple-copy items (and hence two-phase commit will do). MDM (master data management) does not appear to be well suited to EC, as implementations typically involve updates funneled through one or two central sites, then replication of the updated item value to all other copies.

Well, then, where does EC fit? The answer seems to be, in scale-out multiple-copy distributed data architectures involving infrequent, predictably-timed updates to each item. For example, a large PC-server farm providing E-commerce to consumers may emphasize prompt response to a rapidly changing workload of customer orders, each customer record update being typically delayable until the time the customer takes to respond to a prompt for the next step in the process. In these cases, data mining across multiple customers can wait until a customer has finished, or can use the previous version of a particular customer’s data. It is therefore no surprise that Amazon would find EC useful.

Conclusions

If we could really implement EC in all cases, it would be a major boost to database performance, as well as to the pure scale-out architectures that otherwise seem to make less and less sense in this era when costs, energy/carbon wastage, and administrative complexity make such architectures less and less desirable. Sadly, I have to conclude, at least for now, that most traditional database use cases typically do not fit the EC model.

However, that is no reason that much more cannot be done in applying EC to “mixed” Web-associated transaction streams. These are, after all, a significant and increasing proportion of all transactional workloads. In these, EC could finally simulate true parallel data processing, rather than the concurrency which can slow really large-scale transaction-handling by orders of magnitude. As an old analysis-of-algorithms guy, I know that time parallelism can translate to exponential performance improvements as the amount of data processed approaches infinity; and if coordination between item copies is minimal, time parallelism is approximated. So EC may very well not be limited in its usefulness to large-scale E-commerce use cases, but may apply to many other use cases within large Web-dependent server farms – and cloud computing is an obvious example. I conclude that EC may not be appropriate for a wide range of today’s transactional needs; but cloud computing implementers, and major database vendors looking to support cloud computing, should “kick the tires” and consider implementing EC capabilities.

Monday, July 13, 2009

Cloud Computing and Data Locality: Not So Fast

Cloud Computing and Data Locality: Not So Fast

In one of my favorite sleazy fantasy novels (The Belgariad, David Eddings) one of the characters is attempting to explain to another why reviving the dead is not a good idea. “You have the ability to simplify, to capture a simple visual picture of something complex [and change it],” the character says. “But don’t over-simplify. Dead is dead.”

In a recent white paper on cloud computing, in an oh-by-the-way manner, Sun mentions the idea of data locality. If I understand it correctly, virtual “environments” in a cloud may have to physically move not only from server to server, but from site to site and/or from private data center to public cloud server farm=2 0and back. More exactly, the applications don’t have to move (just their “state”), and the virtual machine software and hardware doesn’t have to move (it can be replicated or emulated in the target machine; but the data may have to be moved or copied in toto (or continue to access the same physical data store, remotely – which would violate the idea of cloud boundaries, among other problems [like security and performance]). To avoid this, it is apparently primarily up to the developer to keep in mind data locality, which seems to mean avoiding moving the data where possible by keeping it on the same physical server-farm site.

Data locality will certainly be a quick fix for immediate problems of how to create the illusion of a “virtual data center.” But is it a long-term fix? I think not. The reason, I assert, is that cloud computing is an over-simplification – physically distributed data is not virtually unified data -- and our efforts to patch it to approximate the “ideal cloud” will result in unnecessary complexity, cost, and legacy systems.

Consider the most obvious trend in the computing world in the last few years: the inexorable growth in storage of 40-60% per year, continuing despite the recession. The increase in storage reflects, at least partly, an increase in data-store size per application, or, if you wish, per “data center”. It is an increase that appears faster than Moore’s Law, and faster than the rate of increase in communications bandwidth. If moving a business-critical application’s worth of data right now from secondary to primary site for disaster-recovery purposes takes up to an hour, it is likely that moving it two years from now will take 1 ½-2 hours, and so on. Unless this trend is reversed, the idea of a data center that can move or re-partition in minutes between public and private cloud (or even between Boston and San Francisco in a private cloud) is simply unrealistic.

Of course, since the unrealistic doesn’t happen, what will probably happen is that developers will create kludges, one for each application that is20“cloud-ized”, to ensure that data is “pre-copied” and periodically “re-synchronized”, or that barriers are put in the way of data movement from site to site within the theoretically virtual public cloud. That’s the real danger – lots of “reinventing the wheel” with attendant long-term unnecessary costs of administering (and developing new code on top of) non-standardized data movements and the code propelling it, database-architecture complexity, and unexpected barriers to data movement inside the public cloud.

What ought to provide a longer-term solution, I would think, is (a) a way of slicing the data so that only the stuff needed to “keep it running” is moved – which sounds like Information Lifecycle Management (ILM), since one way of doing this is to move the most recent data, the data most likely to be accessed and updated – and (b) a standardized abstraction-layer interface to the data that enforces this. In this way, we will at least have staved off data-locality problems for a few more years, and we don’t embed kludge-type solutions in the cloud infrastructure forever.

However, I fear that such a solution will not arrive before we have created another complicated administrative nightmare. On the one hand, if data locality rules, haven’t we just created a more complicated version of SaaS (the application can’t move because the data can’t?) On the other hand, if our kludges succeed in preserving the illusion of the dynamic application/service/data-center by achieving some minimal remote data movement, how do we scale cloud server-farm sites steadily growing in data-store size by load-balancing hundreds of undocumented hard-coded differing pieces of software accessing data caches that are pretending to be exabytes of physically-local data and are actually accessing remote data during a cache miss?

A quick search of Google finds no one raising this particular point. Instead, the concerns relating to data locality seem to be about vendor lock-in, compliance with data security and privacy regulations, and the difficulty of moving the data for the first time. Another commentator notes the absence of standardized interfaces for cloud computing.

But I say, dead is dead, not alive by another name. If data is always local, that’s SaaS, not cloud by another name. And when you patch to cover up over-simplification, you create unnecessary complexity. Remember when simple-PC server farms were supposed to be an unalloyed joy, before the days of energy concerns and recession-fueled squeezes to high distributed-environment administrative IT costs? Or when avoidance of vendor lock-in was worth the added architectural complexity, before consolidation showed that it wasn’t? I wonder, when this is all over, will IT echo Oliver Hardy, and say to vendors, “Well, Stanley, here’s another fine mess you’ve gotten me into”?

Monday, June 29, 2009

The Dangers of Rewriting History: OS/2 and Cell Phones

Recently, Paul Krugman has been commenting on what he sees as the vanishing knowledge of key concepts such as Say’s Law in the economics profession, partly because it has been in the interest of a particular political faction that the history of the Depression be rewritten in order to bolster their cause. The danger of such a rewriting, according to Krugman, is that it saps the will of the US to take the necessary steps to handle another very serious recession. This has caused me to ask myself, are there corresponding dangerous rewritings of history in the computer industry?

I think there are. The outstanding example, for me, is the way my memory of what happened to OS/2 differs from that of others that I have spoken to recently.

Here’s my story of what happened to OS/2. In the late 1980s, Microsoft and IBM banded together to create a successor to DOS, then the dominant operating system in the fastest-growing computer-industry market. The main reason was users’ increasing interest in the Apple’s GUI-based rival operating system. In time, the details of OS/2 were duly released.

Now, there were two interesting things about OS/2, as I found out when researching it as a programmer at Prime Computer. First, there were a large stack of APIs for various purposes, requiring many large manuals of documentation. Second, OS/2 also served as the basis for a network operating system (NOS) called LAN Manager (Microsoft’s product). So if you wanted to implement a NOS involving OS/2 PCs, you had to implement LAN Manager. But, iirc, LAN Manager required 64K of RAM memory in the client PC – and PCs were still 1-2 years from supporting 64K of RAM.

This reason this mattered is that, as I learned from talking to Prime sales folk, NOSs were in the process of shattering the low-end offerings of major computer makers. The boast of Novell at that time was that, using a basic PC as the server, it could deliver shared data and applications to any client PC faster than that PC’s own disk. So a NOS full of cheap PCs was just the thing for any doctor’s office, retail store, or other department/workgroup, much cheaper than a mini from Prime, Data General, Wang, or even IBM – and it could be composed of the PCs that members of the workgroup had already acquired for other purposes.

In turn, this meant that the market for PCs was really a dual consumer/business market involving PC LANs, in which home computers were used interchangeably with office ones. So all those applications that the PC LANs supported would have to run on DOS PCs with something like Novell NetWare, because OS/2 PCs required LAN Manager, which would not be usable for another 2 years … you get the idea. And so did the programmers of new applications, who, when they waded through the OS/2 documentation, found no clear path to a big enough market for OS/2-based apps.

So here was Microsoft, watching carefully as the bulk of DOS programmers held off on OS/2, and Apple gave Microsoft room to move by insisting on full control of their GUI’s APIs, shutting out app programmers. And in a while, there was the first version of Windows. It was not as powerful as OS/2, nor was it backed by IBM. But it supported DOS, it allowed any NOS but LAN Manager, and the app programmers went for it in droves. And OS/2 was toast.

Toast, also, were the minicomputer makers, and, eventually, many of the old mainframe companies in the BUNCH (Burroughs, Univac, NCR, Control Data, Honeywell). Toast was Apple’s hope of dominating the PC market. The sidelining of OS/2 was part of the ascendance of PC client-server networks, not just PCs, as the foundation of server farms and architectures that were applied in businesses of all scales.

What I find, talking to folks about that time, is that there seem to be two versions, different from mine, about what really happened at that time. The first I call “evil Microsoft” or “it’s all about the PC”. A good example of this version is Wikipedia’s entry on OS/2. This glosses over the period between 1988, when OS/2 was released, and 1990, when Windows was released, in order to say that (a) Windows was cheaper and supported more of what people wanted than OS/2, and (b) Microsoft arranged that it be bundled on most new PCs, ensuring its success. In this version, Microsoft seduced consumers and businesses by creating a de-facto standard, deceiving businesses in particular into thinking that the PC was superior to (the dumb terminal, Unix, Linux, the mainframe, the workstation, network computers, open source, the cell phone, and so on). And all attempts to knock the PC off its perch since OS/2 are recast as noble endeavors thwarted by evil protectionist moves by monopolist Microsoft, instead of failures to provide a good alternative that supports users’ tasks both at home and at work via a standalone and networkable platform.

The danger of this first version, imho, is that we continue to ignore the need of the average user to have control over his or her work. Passing pictures via cell phone and social networking via the Internet are not just networking operations; the user also wants to set aside his or her own data, and work on it on his or her own machine. Using “diskless” network computers at work or setting too stringent security-based limits on what can be brought home simply means that employees get around those limits, often by using their own laptops. By pretending that “evil Microsoft” has caused “the triumph of the PC”, purveyors of the first version can make us ignore that users want both effective networking to take advantage of what’s out there and full personal computing, one and inseparable.

The second version I label “it’s the marketing, not the technology.” This was put to me in its starkest form by one of my previous bosses: it didn’t matter that LAN Manager wouldn’t run on a PC, because what really killed OS/2, and kills every computer company that fails, was bad marketing of the product (a variant, by the way, is to say that it was all about the personalities: Bill Gates, Steve Ballmer, Steve Jobs, IBM). According to this version, Gates was a smart enough marketer to switch to Windows; IBM were dumb enough at marketing that they hung on to OS/2. Likewise, the minicomputer makers died because they went after IBM on the high end (a marketing move), not because PC LANs undercut them on the low end (a technology against which any marketing strategy probably would have been ineffective).

The reason I find this attitude pernicious is that I believe it has led to a serious dumbing down of computer-industry analysis and marketing in general. Neglect of technology limitations in analysis and marketing has led to devaluation of technical expertise in both analysts and marketers. For example, I am hard-pressed to find more than a few analysts with graduate degrees in computer science and/or a range of experience in software design that give them a fundamental understanding of the role of the technology in a wide array of products – I might include Richard Winter and Jonathan Eunice, among others, in the group of well-grounded commentators. It’s not that other analysts and marketers don’t have important insights to contribute, whether they’re from IT, journalism, or generic marketing backgrounds; it is that the additional insights of those who understand what technologies underlie an application are systematically devalued as “just like any other analyst,” when those insights can indeed do a better job of assessing a product and its likelihood of success/usefulness.

Example: does anyone remember Parallan? In the early ‘90s, they were a startup betting on OS/2 LAN Manager. I was working at Yankee Group, which shared the same boss and location as a venture capital firm called Battery Ventures. Battery Ventures invested in Parallan. No one asked me about it; I could have told them about the technical problems with LAN Manager. Instead, the person who made the investment came up to me later and filled my ears with laments about how bad luck in the market had deep-sixed his investment.

The latest manifestation of this rewriting of history is the demand that analysts be highly visible, so that there’s a connection between what they say and customer sales. Visibility is about the cult of personality – many of the folks who presently affect customer sales, from my viewpoint, often fail to appreciate the role of the technology that comes from outside of their areas of expertise, or view the product almost exclusively in terms of marketing. Kudos, by the way, to analysts like Charles King, who recognize the need to bring in technical considerations in Pund-IT Review from less-visible analysts like Dave Hill. Anyway, the result of dumbing-down by the cult of visibility is less respect for analysts (and marketers), loss of infrastructure-software “context” when assessing products on the vendor and user side, and increased danger of the kind of poor technology choices that led to the demise of OS/2.

So, as we all celebrate the advent of cell phones as the successor to the PC, and hail the coming of cloud computing as the best way to save money, please ignore the small voice in the corner that says that the limitations of the technology of putting apps on the cell phone matter, and that cloud computing may cause difficulties with individual employees passing data between home and work. Oh, and be sure to blame the analyst or marketer for any failures, so the small voice in the corner will become even fainter, and history can successfully continue to be rewritten.

Storage/Database Tuning: Whither Queuing Theory?

I was listening in on a discussion of a recent TPC-H benchmark by Sun (hardware) and its ParAccell columnar/in-memory-technology database (cf recent blog posts by Merv Adrian and Curt Monash), when a benchmarker dropped an interesting comment. It seems that ParAccell used 900-odd TB of storage to store 30 TB of data, not because of inefficient storage or to “game” the benchmark, but because disks are now so large that in order to gain the performance benefits of streaming from multiple spindles into main memory, ParAccell had to use that amount of storage to allow parallel data streaming from disks to main memory. Thus, if I understand what the benchmarker said, in order to maximize performance, ParAccell had to use 900-odd 1-terabyte disks simultaneously.

What I find interesting about that comment is the indication that queuing theory still means something when it comes to database performance. According to what I was taught back in 1979, I/Os pile up in a queue when the number of requests is greater than the number of disks, and so at peak load, 20 500-MB disks can deliver a lot better performance than 10 1-GB disks – although they tend to cost a bit more. The last time I looked, at list price 15 TB of 750-GB SATA drives cost $34,560, or 25% more than 15 TB of 1-TB SATA drives.

The commenter then went on to note that, in his opinion, solid-state disk would soon make this kind of maneuver passé. I think what he’s getting at is that solid-state disk should be able to provide parallel streaming from within the “disk array”, without the need to go to multiple “drives”. This is because solid-state disk is main memory imitating disk: that is, the usual parallel stream of data from memory to processor is constrained to look like a sequential stream of data from disk to main memory. But since this is all a pretence, there is no reason that you can’t have multiple disk-memory “streams” in the same SSD, effectively splitting it into 2, 3, or more “virtual disks” (in the virtual-memory sense). It’s just that SSDs were so small in the old days, there didn’t seem to be any reason to bother.

To me, the fact that someone would consider using 900 TB of storage to achieve better performance for 30 TB of data is an indication that (a) the TPC-H benchmark is too small to reflect some of the user data-processing needs of today, and (b) memory size is reaching the point at which many of these needs can be met just with main memory. A storage study I have been doing recently suggests that even midsized firms now have total storage needs in excess of 30 TB, and in the case of medium-sized hospitals (with video-camera and MRI/CAT scan data) 700 TB or more.

To slice it finer: structured-data database sizes may be growing, but not as fast as memory sizes, so many of these (old-style OLTP) can now be done via main memory and (as a stopgap for old-style programs) SSD. Unstructured/mixed databases, as in the hospital example, still require regular disk, but now take up so much storage that it is still possible to apply queuing theory to them by streaming I/O in parallel from data striped on 100s of disks. Data warehouses fall somewhere in between: mostly structured, but still potentially too big for memory/SSD. But data warehouses don’t exist in a vacuum: the data warehouse is typically physically in the same location as unstructured/mixed data stores. By combining data warehouse and unstructured-data storage and striping across disks, you can improve performance and still use up most of your disk storage – so queuing theory still pays off.

How about the next three years? Well, we know storage size is continuing to grow, perhaps at 40-50%, despite the re cession, as regulations about email and video data retention continue to push the unstructured-data “pig” through the enterprise’s data-processing “python.” We also know that Moore’s Law may be beginning to break down, so that memory size may be on a slower growth curve. And we know that the need for real-time analysis is forcing data warehouses to extend their scope to updatable data and constant incremental OLTP feeds, and to relinquish a bit of their attempt to store all key data (instead, allowing in-situ querying across the data warehouse and OLTP).

So if I had to guess, I would say that queuing theory will continue to matter in data warehousing, and that fact should be reflected in any new or improved benchmark. However, SSDs will indeed begin to impact some high-end data-warehousing databases, and performance-tuning via striping will become less important in those circumstances – that also should be reflected in benchmarks. However, it is plain that in such a time of transition, benchmarks such as TPC-H cannot fully and immediately reflect each shift in the boundary between SSD and disk. Caveat emptor: users should begin to make finer-grained decisions about which applications belong with what kind of storage tiering.

Friday, June 12, 2009

Microsoft's LiveCam: The Value of Narcissism

Yesterday, I participated in Microsoft’s grand experiment in a “virtual summit”, by installing Microsoft LiveCam on my PC at home and then doing three briefings by videoconferencing (two user briefings lacked video, and the keynote required audio via phone). The success rate wasn’t high; in two of the three briefings, we never did succeed in getting both sides to view video, and in one briefing, the audio kept fading in and out. From some of the comments on Twitter, many of my fellow analysts were unimpressed by their experiences.

However, in the one briefing that worked, I found there was a different “feel” to the briefing. Trying to isolate the source of that “feel” – after all, I’ve seen jerky 15-fps videos on my PC before, and video presentations with audio interaction – I realized that there was one aspect to it that was unique: not only did I (and the other side) see each other; we also saw ourselves. And that’s one possibility of videoconferencing that I’ve never seen commented on (although see http://www.editlib.org/p/28537).

The vendor-analyst interaction, after all, is an alternation of statements meant to convince: the vendor, about the value of the solution; the analyst, about the value of the analysis. Each of those speaker statements is “set up” immediately previously by the speaker acting as listener. Or, to put it very broadly, in this type of interaction a good listener makes a good convincer.

So the key value of a videoconference of this type is that instant feedback about how one is coming across as both a listener and speaker is of immense value. With peripheral vision the speaker can adjust his or her style so he/she appears more convincing to himself/herself; and the listener can adjust his or her style so as to emphasize interest in the points that he/she will use as a springboard to convince in his/her next turn as speaker. This is something I’ve found to work in violin practice as well: it allows the user to move quickly to playing with the technique and expression that one is aiming to employ.

So, by all means, criticize the way the system works intermittently and isn’t flexible enough to handle all “virtual summit” situations, the difficulties in getting it to work, and the lack of face-to-face richer information-passing. But I have to tell you, if all of the summit had been like that one brief 20 minutes where everything worked and both sides could see the way they came across, I would actually prefer that to face-to-face meetings.

“O wad some God the giftie gie us,” said my ancestors’ countryman, Scotsman Robbie Burns, “To see ourselves as others see us.” The implication, most have assumed, is that we would be ashamed of our behavior. But with something like Microsoft’s LiveCam, I think the implication is that we would immediately change our behavior so we liked what we saw; and would be the better for our narcissism.

Monday, June 8, 2009

Intel Acquires Wind River: the Grid Marries the Web?

Thinking about Intel’s announcement on Friday that it will acquire Wind River Systems, it occurs to me that this move syncs up nicely with a trend that I feel is beginning to surface: a global network of sensors of various types (call it the Grid) to complement the Web. But the connection isn’t obvious; so let me explain.

The press release from Intel emphasized Wind River’s embedded-software development and testing tools. Those are only a part of its product portfolio – its main claim to fame over the last two decades has been its proprietary real-time operating system/RTOS, VxWorks (it also has a Linux OS with real-time options). So Intel is buying not only software for development of products such as cars and airplanes that have software in them; it is buying software to support applications that must respond to large numbers of inputs (typically from sensors) in a fixed amount of time, or else catastrophe ensues. Example: a system keeps track of temperatures in a greenhouse, with ways to seal off breaches automatically; if the application fails to respond to a breach in seconds, the plants die.

Originally, in the early development of standardized Unix, RTOSs were valued for their robustness; after all, not only do they have to respond in a fixed time, but they also have to make sure that no software becomes unavailable. However, once Open Software Foundation and the like had added enough robustness to Unix, RTOSs became a side-current in the overall trend of computer technology, of no real use to the preponderance of computing. So why should RTOSs matter now?

What Is the Grid?
Today’s major computing vendors, IBM among the foremost, are publicizing efforts to create the Smart Grid, software added to the electrical-power “grid” in the United States that will allow users to monitor and adapt their electricity usage to minimize power consumption and cost. This is not to be confused with grid computing, which created a “one computer” veneer over disparate, distributed systems, typically to handle one type of processing. The new Smart Grid marries software to sensors and a network, with the primary task being effective response to a varying workload of a large number of sensor inputs.

But this is not the only example of global, immediate sensor-input usage – GPS-based navigation is another. And this is not the only example of massive amounts of sensor data – RFID, despite being slow to arrive, now handles RFID-reader inputs by the millions.

What’s more, it is possible to view many other interactions as following the same global, distributed model. Videos and pictures from cell phones at major news events can, in effect, be used as sensors. Inputs from sensors at auto repair shops can not only be fed into testing machines; they can be fed into global-company databases for repair optimization. The TV show CSI has popularized the notion that casino or hospital video can be archived and mined for insights into crimes and hospital procedures, respectively.

Therefore, it appears that we are trending towards a global internetwork of consumer and company sensor inputs and input usage. That global internetwork is what I am calling the Grid. And RTOSes begin to matter in the Grid, because an RTOS such as VxWorks offer a model for the computing foundations of the Grid.

The Grid, the Web, and the RTOS
The model for the Grid is fundamentally different from that of the Web (which is not to say that the two cannot be merged). It is, in fact, much more like that of an RTOS. The emphasis in the Web is of flexible access to existing information, via searches, URLs, and the like. The emphasis in the Grid is on rapid processing of massive amounts of distributed sensor input, and only when that requirement has been satisfied does the Grid turn its attention to making the resulting information available globally and flexibly.

This difference, in turn, can drive differences in computer architecture and operating software. The typical server, PC, laptop, or smartphone assumes that it the user has some predictable control over the initiation and scheduling of processes – with the exception of networking. Sensor-based computing is much more reactive: it is a bit like having one’s word processing continually interrupted by messages that “a new email has arrived”. Sensors must be added; ways must be found to improve the input prioritization and scheduling tasks of operating software; new networking standards may need to be hardwired to allow parallel handling of a wide variety of sensor-type inputs plus the traditional Web feeds.

In other words, this is not just about improving the embedded-software development of large enterprises; this is about creating new computing approaches that may involve major elaborations of today’s hardware. And of today’s available technologies, the RTOS is among the most experienced and successful in this type of processing.

Where Intel and Wind River Fit
Certainly, software-infused products that use Intel chips and embedded software are a major use case of Intel hardware. And certainly, Wind River has a market beyond sensor-based real-time processing, in development of embedded software that does not involve sensors, such as networking software and cell-phone displays. So it is reasonable for Intel to use Wind River development and testing tools to expand into New Product Development for software-infused products like audio systems; and it is reasonable for commentators to wonder if such a move trespasses on the territory of vendors such as IBM, which has recently been making a big push in software-infused NPD.

What I am suggesting, however, is that in the long run, Wind River’s main usefulness to Intel may be in the reverse direction: providing models for implementing previously software-based sensor-handling in computing hardware. Just as many formerly software-only graphics functions have moved into graphics chips with resulting improvements in the gaming experience and videoconferencing, so it can be anticipated that moving sensor-handling functions into hardware can make significant improvements in users’ experience of the Grid.

Conclusions
If it is indeed true that a greater emphasis on sensor-based computing is arriving, how much effect does this trend have on IT? In the short run, not much. The likely effect of Intel’s acquisition of Wind River over the next year, for example, will be on users’ embedded software development, and not on providing new avenues to the Grid.

In the long run, I would anticipate that the first Grid effects from better Intel (or other) solutions would show up in an IT task like power monitoring in data centers. Imagine a standardized chip for handling distributed power sensing and local input processing across a data center, wedded to today’s power-monitoring administrative software. Extended globally across the enterprise, supplemented by data-mining tools, used to provide up-to-date data to regulatory agencies, extended to clouds to allow real-time workload shifting, supplemented by event-processing software for feeding corporate dashboards, extended to interactions with the power company for better energy rates, made visible to customers of the computing utility as part of the Smart Grid – there is a natural pathway from sensor hardware in one machine to a full Grid implementation.

And it need not take Intel beyond its processor-chip comfort zone at all.

Tuesday, May 19, 2009

Progress' Joe Alsop and SMBs

It seems as if I’m doing a lot of memorializing these days – first Sun, now Joseph Alsop, CEO of Progress Software since its founding 28 years ago. It’s strange to think that Progress started up shortly before Sun, but took an entirely different direction: SMBs (small-to-medium-sized businesses) instead of large enterprises, software instead of hardware. So many database software companies since that time that targeted large enterprises have been marginalized, destroyed, crowded out, or acquired by IBM, CA (acting, in Larry Ellison’s pithy phrase, as “the ecosystem’s needed scavenger”), and Oracle.

Let’s see, there’s IDMS, DATACOM-DB, Model 204, and ADABAS from the mainframe generation (although Cincom with TOTAL continues to prosper), and Ingres, Informix, and Sybase from the Unix-centered vendors. By contrast, Progress, FileMaker, iAnywhere (within Sybase), and Intersystems (if you view hospital consortiums as typically medium-scale) have lasted and have done reasonably well. Of all of those SMB-focused database and development-tool companies, judged in terms of revenues, Progress (at least until recently) has been the most successful. For that, Joe Alsop certainly deserves credit.

But you don’t last that long, even in the SMB “niche”, unless you keep establishing clear and valuable differentiation in customers’ minds. Looking back over my 16 years of covering Progress and Joe, I see three points at which Progress made a key change of strategy that turned out to be right and valuable to customers.

First, in the early ‘90s, they focused on high-level database-focused programming tools on top of their database. This was not an easy thing to do; some of the pioneers, like Forte (acquired by Sun) and PowerBuilder (acquired by Sybase), had superb technology that was difficult to adapt to new architectures like the Web and low-level languages like Java. But SMBs and SMB ISVs continue to testify to me that applications developed on Progress deliver SMB TCO and ROI superior to the Big Guys.

Second, they found the SMB ISV market before most if not all other ISVs. I still remember a remarkable series of ads shown in one of their industry analyst days featuring a small shop whose owner, moving as slow as molasses, managed to sell one product to one customer during the day – by instantly looking up price and inventory and placing the order using a Progress-ISV-supplied customized application. That was an extreme; but it captured Progress’ understanding that the way to SMBs’ hearts was no longer just directly or through VARs, but also through a growing cadre of highly regional and niche-focused SMB ISVs. By the time SaaS arrived and folks realized that SMB ISVs were particularly successful at it, Progress was in a perfect position to profit.

Third, they home-grew and took a leadership position in ESBs (Enterprise Service Buses). It has been a truism that SMBs lag in adoption of technology; but Progress’ ESB showed that SMBs and SMB vendors could take the lead when the product was low-maintenance and easily implemented – as opposed to the application servers large-enterprise vendors had been selling.

As a result of Joe Alsop and Progress, not to mention the mobile innovations of Terry Stepien and Sybase, the SMB market has become a very different place – one that delivers new technology to large enterprises as much as large-enterprise technology now “trickles down” to SMBs. The reason is that what was sauce for the SMB goose was also sauce for the workgroup and department in the large enterprise – if it could be a small enough investment to fly under the radar of corporate standards-enforcers. Slowly, many SMBs have grown into “small large” enterprises, and many workgroups/departments have persuaded divisions, lines of business, and even data centers in large enterprises to see the low-cost and rapid-implementation benefits of an SMB-focused product. Now, big vendors like IBM understand that they win with small and large customers by catering to the needs of regional ISVs instead of the enterprise-app suppliers like SAP and Oracle. Now, Progress does a lot of business with large enterprises, not just SMBs.

Running a company focused on SMB needs is always a high-wire act, with constant pressure on the installed base by large vendors selling “standards” and added features, lack of visibility leading customers to worry about your long-term viability (even after the SMB market did far better in the Internet bust than large-enterprise vendors like Sun!), and constant changes in the technology that bigger folk have greater resources to implement. To win in the long term, you have to be like Isaiah Berlin’s hedgehog – have one big unique idea, and keep coming up with a new one – to counter the large-vendor foxes, who win by amassing lots of smaller ideas. Many entrepreneurs have come up with one big idea in the SMB space; but Joe Alsop is among the few that have managed to identify and foster the next one, and the one after that. And he managed to do it while staying thin.

But perhaps the greatest testimony to Joe Alsop is that I do not have to see his exit from CEO-ship as part of the end of an era. With Sun, with CA as Charles Wang left, with Compuware, the bloom was clearly off the old business-model rose. Progress continues to matter, to innovate, and to be part of an increase in importance of the SMB market. In fact, this is a good opportunity to ask yourself, if you’re an IT shop, whether cloud computing means going to Google, Amazon, IBM, and the like, or the kind of SMB-ISV-focused architecture that Progress is cooking up. Joe Alsop is moving on; the SMB market lives long and prospers!

Monday, May 18, 2009

Classical Music and Economics

As someone who was serious about classical violin in my teenage years, and who has occasionally revisited the field of classical music since (due to my father’s passionate love for all periods classical), I rarely think of my profession or indeed other fields of study as related to classical music at all. Recently, however, I read Jeffrey Sachs’ Common Wealth and Alan Beattie’s False Economies, both fresh looks at seemingly well-established economic truths. For some reason, I then wondered if a fresh look at classical music from an economic point of view would yield new insights.

So I did a quick Google search on recent articles on the economics of classical music. What I found was a bit disturbing: over the past 10 years, the common theme of commentators was that classical music now depended on the continued patronage of rich donors and governments; and that the income of top classical artists was limited by their visibility to a narrow, high-income classical-music audience as mediated by record-company executives and concert-hall bookers. The reason this was unsettling was that I had heard almost exactly the same analysis 50 years ago. In the 1990s, an oboist-turned-journalist wrote an autobiography-cum-analysis called Sex, Drugs, and Mozart, in which she argued that funneling money primarily to orchestras was creating an untenable situation in which too many musicians were chasing too few dollars via patron and government funding of orchestras. I conclude that neither revival nor disaster has happened; instead, classical music has reached a “steady state” in which a lot of children are classical music performers, and from college on attention ceases; for grown-ups, classical music becomes a “symbol of class” that otherwise takes up less and less of the world’s attention.

From my experience, the sense of distance in the audience is palpable. Parents who attend their kids’ concerts, or business people who dress up to go to an orchestra concert, typically have no idea of what is “good” or “bad”; so they try to react as they feel they are supposed to, by feeling moved. But take away the social imperative, and they have little urge to keep attending. I, on the other hand, like to go back, because I like to argue with how the piece is performed: do I like Heifetz better in this phrase, or Joshua Bell? Rubinstein or Yo-Yo Ma? De los Angeles or Britney Spears (or Jacques Brel)? Will I ever again hear the amazing subtleties of the Brahms Piano Trios done by Stern-Istomin-Rose? Is this the time when one of them will finally play the Bach Chaconne the way it could be played – by me, if I were a performer?

How is this different from other arts, or “entertainment” in general? Consider pop music, or jazz. The economics of jazz are dreadful; but the settings for performing are generally intimate, and many of the audience dream they could be as good as the performers. Rock is more often large-audience or recordings, but connection with the audience is usually just as important as with jazz, and the surgeon in House can fantasize he can play solo with the band – we would have to go back to the “retro” character Charles in M*A*S*H to find a comparable person who could imagine being a solo pianist.

Or, consider competitive sports. It is certainly true that very, very few ever make it to the big leagues; and yet the audience for them is large, and growing. The common thread, here, is that children who play and watch sports can believably imagine themselves in the place of the superstars. People learn from games, and they fantasize about them. Competition is not necessarily the most important part of sports: learning from the excellent, even if it is collecting stats or admiring the way they look, is important too.

Looking at various parts of the entertainment industry and leisure-time consumers, this seems to be a good way of distinguishing “growing” and “mature” segments. Knitting or pottery are potentially participatory; surgery and math problem solution are generally not, although both may require equal amounts of skill for the best performers.

What this says to me is that classical music, economically speaking, does not have to be a backwater. What is required is that a large number of adults can be attracted to spectating, because, as spectators, they can imagine themselves as the performers – and they can bring their own ideas to the show.

If this is true, then many of the ideas about how to “revive” classical music are subtly but dangerously wrong-headed. The economic need, it is asserted, is to economize by focusing on large, cost-effective groupings of musicians, like the orchestra or the opera – make it bigger and snazzier. But these distance a particular performer from a particular part of the audience, by creating a situation in which few of the audience understand the subtlety of the way a solo performance differs from every other repetition of a phrase, by physically and emotionally distancing performer(s) and audience(s), and by limiting audience rules of interaction and fantasy to “end of every half hour applaud.” Going out into the schools or giving free concerts of the latest classical compositions are effectively beside the point: the one simply reinforces a classical-music presence in the schools that will be lost in college anyway, and the other is focusing on moving the audience to “new” classical music rather than engaging their attention in any kind of classical music performance.

There are limits, of course. Maintaining the patronage of orchestras and opera singers is necessary to keep matters at a “steady state”. Cheering during a movement misses the vital element of softness or silence within a piece, just as laughing during a Jack Benny pause may ruin the punch line. But there should be questions asked, challenges made: “Here’s what I’m trying to do here; listen for it, and decide if you like it better;” “Who’s your favorite rock star? how would he/she do the melody here, and would it be better/worse?” “As you listen here, which part do you find most beautiful/moving? How would you change it if you sang it to yourself?” “There are two ways of playing this: deeper/sadder, and louder/angrier. Which do you feel should be the overall message of the piece? How would you imagine yourself conveying that message if you were me?”

We will know whether such an effort is successful when (whatever the copyright implications) tracks from lots of performers are being passed around because they are different from everyone else, and the listener likes to sing along. In that case, just as in rock, the performer and composer will be of equal importance, and old standards performed differently by new generations will become not only valid but expected. And the audience of consumers, continuing on after college, will become a growing market as classical music finally captures the “long tail.”

Classical Music and Economics

Sunday, May 3, 2009

TCO/ROI Methodology

I frequently receive questions about the TCO/ROI studies that I conduct, and in particular about the ways in which they differ from the typical studies that I see. Here’s a brief summary:

• I try to focus on narrower use cases. Frequently, this will involve a “typical” small business and/or a typical medium-sized business – 10-50 users at a single site, or 1000 users in a distributed configuration (50 sites in 50 states, 20 users at each site). I believe that this approach helps to identify situations in which a typical survey averaging over all sizes of company obscures the strengths of a solution for a particular customer need.
• I try to break down the numbers into categories that are simple and reflect the user’s point of view. I vary these categories slightly according to the type of user (IT, ISV/VAR). Thus, for example, for IT users I typically break TCO down into license costs, development/installation costs, upgrade costs, administration costs, and support/maintenance contract costs. I think that these tend to be more meaningful to users than, say, “vendor” and “operational” costs.
• In my ROI computation, I include “opportunity cost savings”, and use a what-if number based on organization size for revenues, rather than attempting to determine revenues ex ante. Opportunity cost savings are estimated as TCO cost savings of a solution (compared to “doing nothing”) reinvested in a project with a 30% ROI. Considering opportunity cost savings gives a more complete picture of (typically 3-year) ROI. Comparing ROIs when revenues are equal allows the user to zero in on how faster implementation and better TCO translate into better profits.
• My numbers are more strongly based on qualitative data from in-depth, open-ended user interviews. Open-ended means that the interviewee is asked to “tell a story” rather than answer “choose among” and “on a scale of” questions, thus giving the interviewee every opportunity to point out flaws in initial research assumptions. I have typically found that a few such interviews yield numbers that are as accurate as, if not more accurate than, 100-respondent surveys.

Let me now, at the end of this summary, dwell for a moment on the advantages of open-ended user interviews. They allow me to focus on a narrower set of use cases without worrying as much about smaller survey size. By avoiding constraining the user to a narrow set of answers, they make sure that I am getting accurate data, and all the data that I need. They allow me to fine-tune and correct the survey as I go along. They surface key facts not anticipated in the survey design. They motivate the interviewee, and encourage greater honesty – everyone likes to “tell their story.” They also provide additional advice to other users – advice of high value to readers that gives additional credibility to the study conclusions.

Saturday, May 2, 2009

Moore's Law Is Dead, Long Live Tanenbaum's Law

Yesterday, I had a very interesting conversation with Mike Hoskins of Pervasive about his company’s innovative DataRush product. But this blog post isn’t about DataRush; it’s about the trends in the computer industry that I think DataRush helps reveal. Specifically, it’s about why, despite the fact that disks remain much slower than main memory, most processes, even those involving terabytes of data, are CPU-bound, not I/O-bound.

Mike suggested, iirc, that around 2006 Moore’s Law – in which every 2 years, approximately, the bit capacity of a computer chip doubled, and therefore processor speed correspondingly increased – began to break down. As a result, software written to assume that increasing processor speed would cover all programming sins against performance – e.g., data lockup by security programs when you start up your PC -- is now beginning to break down, as inevitable scaling of demands on the program are not met by scaling of program performance.

However, thinking about the way in which DataRush, or Vertica, achieve higher performance – in the first case by achieving higher parallelism within a process, in the second case by slicing relational data by columns of same-type data instead of rows of different-sized data – suggests to me that more is going on than just “software doesn’t scale any more.” At the very high end of the database market, which I follow, the software munching on massive amounts of data has been unable to keep up with disk I/O for the last 15 years, at least.

Thinking about CPU processing versus I/O, in turn, reminded me of Andrew Tanenbaum, the author of great textbooks on Structured Computer Organization and Computer Networks in the late 1970s and 1980s. Specifically, in one of his later works, he asserted that the speed of networks was growing faster than the speed of processors. Let me restate that as a Law: the speed of data in motion grows faster than the speed of computing on data at rest.

The implications of Tanenbaum’s Law and the death of Moore’s Law are, I believe, that most computing will be, for the foreseeable future, CPU-bound. Think of it in terms of huge query processing that reviews multiple terabytes of data. Data storage grows by 60% a year, and we would anticipate that the time to get a certain percent of that data off the disk to send to main memory would be greater each year, if networking speed was growing as fast as processor speed, and therefore slower than stored data. Instead, even today’s basic SATA drives can deliver multiple gigabytes/second – faster than the clock speeds of today’s microprocessors. To me, this says that disks are shoving the data at processors faster than they can process it. And the death of Moore’s Law just makes things worse.

The implications are that the fundamental barriers to scaling computing are not processor geometry, but the ability to parallelize the two key “at rest” tasks of the processor: storing the data in main memory, and operating on it. In order to catch up to storage growth and network speed growth, we have to throw as many processors as we can at a task in parallel. And that, in turn, suggests that the data-flow architecture needs to be looked at again.

The concept of today’s architecture is multiple processors running multiple processes in parallel, each process operating on a mass of (sometimes shared) data. The idea of the data-flow architecture is to split processes into unitary tasks, and then flow parallel streams of data under processors which carry out each of those tasks. The distinction here is that in one approach, the focus is in parallelizing multi-task processes that the computer carries out on a chunk of data at rest; in the other the focus is on parallelizing the same task carried out on a stream of data.

Imagine, for instance, that we were trying to find the best salesperson in the company in the last month, with a huge sales database not already prepared for the query. In today’s approach, one process would load the sales records into main memory in chunks, and for each chunk, maintain a running count of sales for every salesman in the company. Yes, the running count is to some extent parallelized. But the record processing is often not.

Now imagine that multiple processors are assigned the task of looking at each record as it arrives, with each processor keeping a running count for one salesperson. Not only are we speeding up the access to the data uploaded from disk by parallelizing that; we are also speeding up the computation of running counts beyond that of today’s architecture, by having multiple processors performing the count on multiple records at the same time. So the two key bottlenecks involving data at rest – accessing the data, and performing operations on the data – are lessened.

Note also that the immediate response to the death of Moore’s Law is the proliferation of multi-core chips – effectively, 4-8 processors on a chip. So a simple way of imposing a data-flow architecture over today’s approach is to have the job scheduler in a symmetric multiprocessing architecture break down processes into unitary tasks, then fire up multiple cores for each task, operating on shared memory. If I understand Mike Hoskins, this is the gist of DataRush’s approach.

But I would argue that if I am correct, programmers also need to begin to think of their programs as optimizing processing of data flows. One could say that event-driven programming does something similar; but so far, that’s typically a special case, not an all-purpose methodology or tool.

Recently, to my frustration, a careless comment got me embroiled again in the question of whether Java or Ruby or whatever is a high-level language – when I strongly feel that these do poorly (if examples on Wikipedia are representative) at abstracting data-management operations and therefore are far from ideal. Not one of today’s popular dynamic, functional, or object-oriented programming languages, as far as I can tell, thinks about optimizing data flow. Is it time to merge them with LabVIEW or VEE?