Thoughts From a Retired Software IT Analyst: 2015

Tuesday, December 15, 2015

Two Humble Suggestions For Basic Research in Computer Science

In a somewhat odd dream the other night, I imagined myself giving potloads of targeted money to some major university a la Bill Gates, and I was choosing to spend it on basic research in computing. What, I asked myself, was worthy of basic research that had not been probed to death already?

Obvious candidates relevant to the frontiers of the computing market these days include artificial intelligence and speech recognition. These, however, imho are well-plowed fields already, having received major basic-research attention at least since Minsky in the 1970s (AI) and government need for translations of foreign documents (written-speech parsing) in the Cold War of the 1960s. The result of 40+ years of basic research has been an achingly slow translation of these into something useful (Watson, Siri, and their ilk), so that most of the action now is in applied rather than basic research. So, I said in my dream, I’m not going down that rathole.

Now, I really am not up to date on what universities are really doing in basic computer research, but I do get some gleanings from some of the prizes awarded to Comp Sci professors at a couple of universities. So I would like to suggest two new areas where I wonder if basic research could really make a difference in the medium term, and allow computing products to do much better. Here they are:

1. Recursive computing; and

2. Causality in analytics.

Recursive Computing

Back in the early 1970s, “theory of algorithms and computing” provided some very valuable insights into the computation times of many key computer tasks, such as sorting and solving a matrix. One hot topic of ongoing research was figuring out whether tasks done in parallel (non-deterministically) in less than n**3 [n cubed] x some constant (where n is the number of number of data points used by the task) can also be done sequentially in that amount of time. In math jargon, this was known as the P (olynomial) = N(on-deterministic)P Problem. Note that at the time, a task that must take exponential time was effectively undoable for all but the smallest cases.

It turned out that several useful tasks fit in the category of those that might possibly be solvable. For example, the traveling salesman problem seeks to minimize travel time for any possible route between n points. If and only if P=NP, then the traveling salesman problem could be done for any case in less than order of n**3 or O(n**3) time. The last time I checked, in the 1980s, the P=NP problem had not been solved, but “good enough” approximations to answers had been identified that got close enough to the right solution to be somewhat satisfactory.

Recursion is, briefly, the solution of a problem of size n by combining the solutions of the same problem of smaller sizes, say, n-1 or n/2. For example, one can solve a sorting problem of size n by sorting two lists of size n/2 and then running a comparison of list 1 and list 2, piece by piece. Each list can, in turn, be solved by sorting 4 lists of size n/4 and combining, and so on down to lists of size 2. If all of this is done sequentially, then the time is O (n log n). If it is done in parallel, however, with n processors, then the time is O (log n). That’s a big speedup through parallelism – but it’s not parallelism as P=NP means it. In practical terms, you simply can’t pack the processors in a tree structure next to each other without the length of time to talk from one processor to another becoming longer and longer. I estimated the crossover point when parallelism become no longer of use at about 10 ** 6 (a million) to 10 ** 9 (a billion) processors.

In the years since the 1980s, this kind of analysis seemed irrelevant to speeding up computing. Other techniques, such as pipelining and massively parallel (but not recursive) arrays of PCs seemed to offer better ways to gain performance. But two recent developments suggest to me that it may be time to dust off and modernize recursion research:

1. Fast Data depends on Apache Spark, and the model of Apache Spark is of one processor per piece of the data stream applied to a humongous flat-memory data storage architecture (a cluster of PCs). In other words, we can achieve real-time transaction processing and initial analytics by completely parallel application of thousands to millions of local PCs followed by recombination of the results. There seems a good case to be made that “divide and conquer” here will yield higher performance than mindless pipelining.

2. Quantum computing has apparently proved its ability to handle computer operations and solve problems. As I understand it, quantum computing data storage (via qubits) is completely parallel, and is not bounded by distance (what Einstein apparently referred to as “spooky [simultaneous] action [by two entangled quantum objects] at a distance [that apparently could be quite large]”. Or, to put it another way, in quantum computing, P=NP.

Whether this means recursion will be useful again, I don’t know. But it seems to me worth the effort.

Causality In Analytics

One of the more embarrassing failures of statistics in recent years was in handling the tobacco controversy. It seemed plain from the data that tobacco smoke was causing first-hand and second-hand data, but the best statistics could apparently do was to establish a correlation, which could mean that tobacco smoke caused cancer, or that genetic tendency to cancer caused one to smoke, or that lung cancer and tobacco use increased steadily because of other factors entirely. It was only when the biological effects of nicotine on the lungs were traced that a clear causal path could be projected. In effect, statistics could say nothing useful about causality without a separate scientific explanation.

A recent jaunt through Wikipedia in search of “causality” confirmed my concerns about the present state of statistics’ ability to identify causality. There were plenty of philosophical attempts to say what causality was, but there was no clear statistical method mentioned that allowed early identification of causality. Moreover, there seemed to be no clear way of establishing anything between correlation/linear regression and pure causality.

If any modern area would seem to offer a promise of something better, it would be business analytics. After all, the name of the game today in most cases is understanding the customer, in aggregate and individually. That understanding also seeks to foster a long-term relationship with key customers. And therefore, distinguishing between the customer that spends a lot but, once dissatisfied, leaves and the customer who spends less but is more likely to be a long-term sell (as a recent Sloan Management Review article pointed out, iirc) can be mission-critical.

The reason basic research would seem likely to yield new insights into causality is that one key component of doing better is “domain knowledge”. Thus, Amazon recently noted that I was interested in climate change, and then proceeded to recommend not one but two books by climate change deniers. Had the analytics been able to use something like IBM’s Watson, they might have deduced that I was interested in climate change because I was highly concerned about it, not because I was a paranoid conspiracy theorist who thought climate change was a plot to rob me of my hard-earned money. And basic research that could establish causal models better should also be able to enhance domain knowledge in order to provide the ability to establish the degree of confidence in causality that is appropriate in a particular case, and avoid data-mining bias (the fact that trying out too many models will increase the chances of choosing an untrue one).

Envoi

I expect that neither of these basic research topics will actually ever be plumbed. Still, that’s my holidays wish list. And now I can stop worrying about that dream.

Monday, November 30, 2015

Climate Change Update: The Fundamental Things Still Apply, and Honest Cost-Benefit Analyses Are Dangerously Flawed

It has been hard to find a good reason to post an update on climate change, although superficially there is a lot of relatively good news. Climate change is now an acceptable part of TV “nature” documentaries, to which so many are addicted; for the first time, China has committed to less coal use and has delivered in a measurable fashion; there is the outline of a global plan for carbon-emission reduction as we head into the latest climate-change summit; and even the rhetoric of the Republican party has allowed for a candidate (Kasich) who admits that serious climate change is happening and something needs to be done about it. And then there is Pres. Obama, who is the first major political figure afaik who has admitted that we not only need to cut back on carbon emissions but also keep a large chunk of our remaining fossil fuels in the ground indefinitely.

The Fundamental Bad News Still Applies

So why do I feel that these are not significant enough to discuss in detail? For this simple reason: as far as we can tell, the rise in atmospheric carbon continues not just in a straight (rising) line, but in a slow acceleration. To put it another way, if atmospheric carbon simply kept increasing at its present rate of about 2.5 ppm per year, by 2100 it would reach about 625 ppm, corresponding to (as per Hansen) to a 4.4 degrees C or 7.6 degrees F global temperature rise. If, however, it continues to accelerate at its present rate, according to one estimate it will reach 920 ppm by 2100, baking in a whopping 7.5 degrees C or almost 14 degrees F increase. When I say “baked in”, I mean that we may not see that amount of temperature increase in 2100, but in the 30-50 years after 2100, much of that temperature increase will show up.

Meanwhile, the climate change this year is unfortunately proceeding as seemed likely 5 years ago. The El Nino that causes temperature spikes was delayed, but as a result is now almost certain to be the strongest on record, causing an inevitable huge new high in global land temperatures (last year was the previous record) of about 2/3 of a degree F above the old record. Moreover, it now seems that the El Nino will continue for quite a few months next year, almost guaranteeing a 2016 global land temperature significantly above 2015. It would not be surprising if 2015 plus 2016 totaled a full 1 degree F jump.

And finally, the “strange” rebound in Arctic sea ice volume after 2012 is clearly over, and our best prognostication suggests that 2016 volume at minimum will be in second place after 2012 – suggesting that at least a linear reduction in volume over the last 40 years continues. As a result, Greenland’s glaciers continue collapsing and should contribute several feet to sea level rise this century. Some forecasts even contemplate a 26-foot rise from all sources (Greenland and Antarctica, primarily) by 2100, although more sober analyses still suggest somewhere between 6 and 16 feet.

In other words, the fundamentals of human-caused climate change continue to apply, however we may delude ourselves that our measures up to now have had a significant impact. Like Alice in The Looking Glass, we will probably have to run twice as fast to get anywhere, and then four times, and then ...

Honest Cost-Benefit Analysis Continues to be Flawed

To my mind, the only really good news, if good news there be, is that I am beginning to see honest cost-benefit analyses – the analyses that potentially really try to face the costs and benefits of the mitigation required to do something significant about climate change. For example, one blog post noted that most analyses failed to reflect people’s difficulties in moving when climate change or climate change mitigation requires it (meaning that a few do). It has been absurd, watching commentators assuming that loss of 50-90% of present arable land and the necessary water translate easily into new but temporary growing spots, with the costs of moving to those locations vanishing by the magic of the so-called free market.

However, an analysis published in MIT’s Technology Review identified one barrier that, in the author’s mind, doomed any near-term conversion to solar energy: Faced with utilities’ resistance to meshing the existing electrical grid with individual solar installations, homeowners are faced with a large installation cost that make solar uncompetitive in the home in the near to medium term.

I don’t think the author’s cost-benefit analysis – for that is what it boiled down to – is obviously flawed. But I do believe that it suffers from several key flaws specific to climate change:

1. It assumes a certain infrastructure (the existing grid) without considering the ways in which that grid will become comparatively more and more costly, despite temporary fixes, as changes in climate make some locations so hot that air conditioning costs shoot through the roof, some locations underwater or damaged by storms, and some locations with a different mix of heat and cooling that the grid was designed for. A solar arrangement is not affected as much by this, because it is necessarily more distributed, can de-novo provide better architecture-based earth-derived heating and cooling, and involves less sunk-cost coal/oil/gas storage for heating.

2. It fails to handle the “disaster scenario” in which the benefits of present cheaper energy fail to outweigh the future costs of disaster caused by everyone coming to the same don’t-change conclusion. In other words, each decision analyzed is not a one-off nor is it isolated – most if not all people will come to the same conclusion and act the same way, and that is the situation that must be modeled. If we all conclude that sea-level property will be fine for 40 years and can be sold thereafter to a greater fool, in a global context we quickly run out of fools, and then we can sell to nobody.

3. It assumes governments will be able to act as the backstop/insurer of last resort. To put it another way, in the typical situation, if businesses fail, governments are expected to handle a portion of the costs of bankruptcies (or to backstop those who do, as in the case of AIG), to provide unemployment benefits so that a pool of labor remains, and to support repair of “common” infrastructure such as roads and heating/cooling when businesses can’t. However, when the effects are close to simultaneous and truly global, most governments are hard put to come up with the necessary support. This, in turn, creates chaos that makes the next (and greater) crisis harder to handle. Effectively, there is a point beyond which all countries are under such stress that despite reallocation of business investment, economies start shrinking and the ability to handle the next stress becomes less and less. Some estimates put that point as early as the 2060s, if we continue as we are going. So the cost of each individual decision that affects carbon pollution mitigation should be factored into a cost-benefit analysis, as well.

Recommendation: Face The Facts

In this situation, I am reminded by an episode in fantasy author Stephen Donaldson’s first series. He posited a leper placed into a world in which an evil, powerful character seeks to turn a wonderfully healthy world into a reflection of the leper’s symptoms: rotting, smelling, causing numbness – and a prophecy says that only the leper can save this world. The leper befriends a “High Lord” dedicated to fighting the evil character, and says, essentially, “Look, you’re trying you’re best but you’re failing. Face facts!” The High Lord, hearing this from the one person who can save his world, says, very carefully, “You have a great respect for facts.” “I hate facts,” was the passionate response, “They’re all I’ve got.”
The point, as I see it, is not to look a grim outlook in the face and give up. It is, rather, in everything done about climate change to understand how a particular effort falls short and how even grimmer forecasts should be factored into the next effort. It is understanding that even as we fail to avoid the consequences of the first 4 degrees C of global warming, we redouble our efforts to avoid the next 4 degrees. It is cutting away the non-essentials of curbing population growth and of dealing with immediate crises such as the Paris bombings, and seeing that in part these are manifestations of stresses in society that global warming is exacerbating, and therefore the primary focus should be solar now now now, or a reasonable equivalent. And it is performing, individually and collectively, the kinds of honest cost-benefit analyses that will confront us with facts that we might not want to face, but which will lead us as quickly as possible in the right direction.

I hate climate change facts. They’re the closest thing to hope I have. Season’s greetings.

Monday, November 23, 2015

IBM Acquires Weather Company IT: Into the Data Unknown

The recent IBM acquisition of much of The Weather Company (TWC) – effectively, everything but The Weather Channel – is an odd duck. I can’t remember anything quite like it in the computer industry before now. Indeed, I suspect that IBM does not yet fully understand where in the search for better analytics via Big Data this new addition will take it. To put it another way, IBM is stepping into an area where the ultimate use of the acquired data is to a surprising extent unknown.

The Potential Benefits of Launching Into the Data Unknown

As I noted in a previous piece about the initial IBM-TWC partnership, some core benefits are easy to see, and the acquisition simply extends them. The infusion of analytics with weather data allows fine-tuning of customer and supplier behavior analysis and prediction – if it’s raining cats and dogs, the retail store may see more or less traffic than on a sunny day, and extreme weather will inevitably lead to decreased sales and deliveries. Of equal long-term potential is the use of climate change science to avoid the “if this goes on” approach to weather forecasting that becomes increasingly unable to anticipate both secular year-round warming and increased occurrence of extreme events. By taking over these functions, IBM allows its customers to drive weather-data acquisition as well as analytics in directions that a TWC-owned platform would likely not have done.

However, according to its presentation of the acquisition, IBM also views the TWC platform as a proven approach for leveraging Big Data in general for new insights, extending to the sensor-driven Web aka the Internet of Things (IoT) the innovative “cognitive” insights of IBM Watson. I would argue, in fact that the TWC IT acquisition adds less – and more. Less, because TWC is not on the cutting edge of the Fast Data “new technology wave” that I have written about previously. More, because if and when it is combined with the raw “real-time sensor” data that governments collect about the weather and related location information (e.g., habitations, flora, fauna, and topography), it provides a simple “location state” of the individual customer or car or delivery truck that can allow as-yet-undefined risk reduction (avoiding the high-water spots in the road), “state of mind” assessment for selling purposes, and services (rerouting based on likely washouts).

An Odd Way to Slice a Duck

Certain parts of the TWC acquisition put IBM effectively into areas that are not necessarily a good long-term fit – goodness of fit will depend on IBM’s execution of the acquisition.

For one thing, acquiring the TWC web sites makes IBM effectively a media company. Hurricane Sandy, in particular, created large followings for the blogs of certain weather forecasters. While IBM can pretend that these are merely avocations for TWC personnel focused on The Weather Channel, the fact is that they represent an important news and analysis source for a significant segment of the public. Whatever IBM does with the link between TWC content suppliers and the web sites will indicate whether IBM is going to run the media part of its slice of TWC into the ground or allow the weather forecasters and analysts that remain with TWC to continue leveraging their clout as experts into eyeballs and advertising revenue.

And the question of who goes where brings up another area where IBM is venturing into new and not necessarily compatible territory: weather and climate expertise. As a recent IBM presentation citing weather data’s usefulness to insurance companies shows, applications to mobile car users that can reduce customer risks (avoiding or sitting out a storm) certainly have an upside to them, but what is not clear is whether the insurance company can use such a mobile app frequently enough to reduce overall customer risk substantially (and thereby increase profit margins).

An obvious case where weather warnings may have an impact on customer risk profiles is a “black swan”, a seemingly unlikely event such as Hurricane Sandy. However, in that case individual customer advice is likely to have only minor effect – rather, the weather experts must recognize the need for and drive proactive, outside-the-company warnings that ensure lots of customers take care. TWC has such experts – but where is the place for them in IBM’s new weather-related organization? Perhaps Watson will have such “domain knowledge” in the future; but it does not have it today, and so “weather analytics” badly needs such expertise. And then weather analytics needs climate domain knowledge and climate change expertise, else predictive analytics applied to raw weather data will be far too frequently wrong.

The User Bottom Line: Going Into the Data Unknown Is Good

I have dwelt on my concerns about IBM’s ability to get the most out of its new oddly sliced duck. The fact remains, however, that, like Watson, a massive chunk of weather data is important not for its immediate applications, but for the ways in which users will drive new types of insights with it. Just as Watson has spawned new “cognitive” ways to analyze data, driven by user use cases, so the early adopters of weather data will generate use cases that will suggest other applications that we cannot foresee. The savvy user, therefore, will pick the right time and the right use case to begin to use weather data – and IBM’s TWC capabilities are an excellent place to start.

And if IBM figures out how to leverage TWC effectively in the areas that I have cited, so much the better for the user. That the weather data will also lead IBM to get even more serious about climate change, I suppose, is too much to hope. But hope I will.

Wednesday, July 29, 2015

Climate Change: The Poison of the Pseudo-Reasonable Economist

In the dog days of summer, as the average global land temperature for the first 6 months of the year is a whopping two-thirds of a degree Fahrenheit above any recorded temperature before it, and almost certainly as hot as or hotter than any time in the last million years, I find myself musing on one pernicious form of climate change obstructionism. Not climate change deniers -- their lies have been endlessly documented and the contrary evidence of accumulating data from appropriate scientists continues to mount. No, I am speaking rather of fundamentally flawed but seemingly rigorous arguments, especially from economists, that serve in the real world only to detract from the urgent message of climate change, and the will to face that message, by understating its likely impacts.

I gathered these two examples of the genre from the blog of Brad deLong, economics professor at Berkeley. I hope Prof. deLong will not be offended if I describe him as the packrat of economic theory (net-net, it's a compliment); he seems to publish both the realistic alarms of Joe Romm and the examples I am about to cite with equal gusto, and to do the same with some of the more dubious efforts of Milton Friedman and Martin Feldstein in other areas. In this case, I am going to use the screeds of Robert Pindyck and Martin Weitzman, cited over the last month, as examples of this kind of pseudo-reasonable economic analysis.

Weitzman: The Black Swan Is a Red Herring

Weitzman's recurring argument, which he has been making for a long time now, is that a "serious" effect of climate change is unlikely -- he calls it a "black swan" event -- but that because it could have unspecified catastrophic consequences, we should try to plan for it, just as a business plans for unlikely concerns like Hurricane Sandy in its "risk management" policies. Sounds reasonable, doesn't it? Except that, by any reasonable analysis of climate change's fundamental model and how it has played out over at least the last five years, merely catastrophic consequences are far more likely than uncatastrophic ones, and catastrophic consequences beyond what Prof. Weitzman seem to be contemplating are the likeliest of all.

When I first began reading up on the field 6 years ago, it was still possible to argue that the very conservative IPCC 2007 model (whose most likely scenario assuming everyone started doing something about climate change projected a little less than 2 degrees Celsius global temperature increase) was at least plausible. After all, back then, the data on Arctic sea ice, Greenland ice melt, and Antarctic ice melt was still not clearly permanently above the IPCC track -- not to mention the fact that permafrost had not clearly started melting.

However, even at that time it was clear to me from my reading that, in all likelihood, the IPCC and similar models were understating the case. They were not considering feedback effects from Arctic sea ice melt that was well in advance of "around 2100, if that" predictions, nor the effects of permafrost melt that was very likely to come. I would also admit that I thought the effects of climate change on weather in the US and Europe would not be visible and obvious enough for political action until around 2020. Meanwhile, Weitzman (2008) was publishing a paper that argued that climate science simply couldn't provide enough exact predictions about temperature increase and the like to make catastrophic climate change anything but a highly unlikely event in economic modeling.

Well, here we are 7 years later. Prof. Hansen has crystallized the most likely scenario by analyzing data on the last such event, 55 million years ago, and showing that a doubling of atmospheric carbon translates to a 4 degrees Centigrade increase in global temperature, two-thirds from the carbon itself, and 1/3 from related greenhouse-gas emissions and feedback effects. Moreover, a great deal has been done to elaborate on the more immediate weather and "catastrophic" effects of this increase, from Dust-Bowl-like drought in most of the US and much of Europe by the end of this century to sea level rises of at least 10+ feet worldwide -- and extension of salt-water poisoning of agriculture and water supplies to an additional 10 feet due to more violent storms.

I cannot say that I am surprised by any of this. I can also say that I see no sign in Prof. Weitzman's comments that he has even noticed it -- despite the fact that, according to Hansen's analysis, we have already blown past 2 degrees Celsius in long-run temperature increases and are beginning to talk about halting emissions growth at 700 ppm or about 5.5 degrees Celsius. No, according to Prof. Weitzman, catastrophic climate change continues to be a "black swan" event.

So the fundamental assumption of Weitzman's statistical analysis is completely wrong -- but why should we care, if it gets people to pay attention? Except that, as our entire history has confirmed and the last 6 years have reconfirmed, when people are told that something is pretty unlikely they typically take their time to do something about it. As temperature increases mount, the amount of catastrophe to be coped with and the amount to do to avoid further increases mounts exponentially. Just as with comets or asteroids striking the Earth -- but with much less justification -- appeals to "risk management" and "black swans" give us license to do just that. No, when you deny for six years the ever-clearer message of climate science that the climate-change forces causing catastrophic effects are likely, quantifiable on average, and large, you might as well be a climate change denier.

Pindyck: The Discount Rate of Death

I must admit, when I saw the name Pindyck but not the conclusions of his paper, I was prepared to be fascinated. I have always regarded his book on econometric modeling, which I first read in the late 1970s, as an excellent summary of the field, still useful after all these years. You can imagine my surprise when I found him echoing Weitzman about how "climate science simply isn't sure about the extent and impacts of climate change, and therefore we should treat those impacts as unlikely". But my jaw almost became unhinged when I read that "we really have no idea what the discount rate [for a given climate-change-inspired policy action in a cost-benefit analysis] should be", and so we should not even attempt to model the costs and benefits of climate change action except in wide-range probabilistic terms.

Iirc, the discount rate in a business-investment analysis is the rate of return that will justify investing in a project. Now, there are workarounds to estimate probabilities and therefore at least approximate return on investment for a particular investment -- but that isn't the source of my bemusement. Rather, it's the notion that one cannot come up with a discount rate for a climate-change-mitigation investment compared with an alternative, and therefore, one cannot do model-based cost-benefit analysis.

Here's my counter-example. Suppose a company must choose between two investments. One returns 5% per year over the next 5 years. The second contains exposed asbestos; it returns 10% over 5 years, and 20 years from now, everyone in the company during that period will die and the company will fold. What is the discount rate under which the company should choose investment 2? It's a trick question, obviously; the discount rate for investment 2 must be infinite to match its infinite costs, and therefore there is no such discount rate.

But that's my point. The costs of climate change are likely and catastrophic, and so you need a really high discount rate to justify the alternative of "business as usual". The only way you can get a low enough discount rate to justify "business as usual" is to assume that climate change catastrophe is very unlikely. And so, as far as I'm concerned, the "we don't know the discount rate" argument takes us right back to Weitzman's and Pindyck’s "climate change catastrophe is unlikely."

Thus, Pindyck's discount-rate argument is also a red herring, and a particularly dangerous one: It seems to move the playing field from climate science, which climate scientists can easily refute, to the arcana of econometrics. Not only does Pindyck fail on the climate science; he uses that failure to cloak inaction in pseudo-economic jargon. And so, when Pindyck's "analysis" winds up making it even harder than Weitzman's to argue for climate-change action, I regard it as particularly poisonous in effect.

The So-Called deLong Smackdown

Prof. deLong occasionally publishes an article called a “smackdown” in his blog that seems to correct him on something he clearly views himself as having erred on. Frankly, I don't view the above as a smackdown; although I wish he and Prof. Krugman would admit that they underestimated gold's disadvantages by comparing it to the S&P 500 index rather than the S&P 500 total return index. Rather, I view this as a wake-up call to both of them, if they truly want economics to deal with the real world. As Joe Romm points out, underestimation of effects for the sake of absolute sureness of a minimum effect by the IPCC is not new, nor is an extensive body of literature giving a picture both far more somber and far better reflected in current real-world weather and climate.

But what are we to make of Weitzman and Pindyck, who apparently have been denying that literature, and then using that denial to peddle a far weaker reason for action, for the last six years or so? 3 years, maybe, as the Arctic sea ice shrank to a new dramatic low only in 2012; but six? No; unless we succumb entirely to the old NPR comedy routine “It’s Dr. Science! He’s smarter than you are!”, this behavior is disingenuous and has poisonous effects. And, because any sort of modeling of the medium-term future should take account of economic effects, it hinders real-world planning just as much as real-world action. Heckuva job, economists – not.

Monday, July 27, 2015

In-Memory Computing Summit 2: Database Innovation

In the late 1990s, my boss at the time at Aberdeen Group asked me a thought-provoking question: Why was I continuing to cover databases? After all, he pointed out, it seemed at first glance like a mature market – because the pace of technology innovation had slowed down and nothing important seemed to be on the horizon. Moreover, consolidation meant that there were fewer and fewer suppliers to cover. My answer at the time was that users in “mature” markets like the mainframe still needed advice on key technologies, which they would not find because analysts following my boss’s logic would flee the field.
However, as it turned out, this was not the right reason at all – shortly thereafter, the database field saw a new round of innovations centering on columnar databases, data virtualization, and low-end Linux efforts that tinkered with the ACID properties of relational databases. These, in turn, led to the master data management, data governance, global repository, and columnar/analytic database technologies of the late 2000s. In the early 2010s, we saw the Hadoop-driven “NoSQL” loosening of commit constraints to enable rapid analytics on Big Data by sacrificing some data quality.
As a result, the late-1990s “no one ever got fired for buying Oracle” mature market wisdom is now almost gone from memory – and databases delivering analytics are close to the heart of many firms’ strategies. And so, it seems that the real reason to cover databases today is that their markets, far from being mature, are rapidly spawning new entrants and offering many technology-driven strategic reasons to upgrade and expand.
The recent 2015 In-Memory Computing Summit suggests that a new round of database innovation, driven by the needs listed in my first post, is bringing changes to user environments especially in three key areas:

1. Redefinition of data-store storage tiering, leading to redesign of data-access software;

2. Redefinition of “write to storage” update completion, allowing flexible loosening of availability constraints in order to achieve real-time or near-real-time processing of operational and sensor-driven Big Data; and

3. Balkanization of database architectures, as achieving Fast Data turns out to mean embedding a wide range of databases and database suppliers in both open source and proprietary forms.

Real-Time Flat-Memory Storage Tiering: Intel Is a Database Company

All right, now that I’ve gotten your attention, I must admit that casting Intel as primarily a database company is a bit of an exaggeration. But there’s no doubt in my mind that Intel is now thinking about its efforts in data-processing software as strategic to its future. Here’s the scenario that Intel laid out at the summit:
At present, only 20% of a typical Intel CPU is being used, and the primary reason is that it sits around waiting for I/O to occur (i.e., needed data to be loaded into main memory) – or, to use a long-disused phrase, applications running on Intel-based systems are I/O-bound. To fix this problem, Intel aims to ensure faster I/O, or, equivalently, the ability to service the I/O requests of each of multiple applications running concurrently, and do it faster. Since disk does not offer much prospect for I/O-speed improvement, Intel has proposed a software protocol standard, NVRAM(e), for flash memory. However, to ensure that this protocol does indeed speed things up adequately, Intel must write the necessary data-loading and data-processing software itself.
So will this be enough for Intel, so that it can go back to optimizing chip sets?
Well, I predict that Intel will find that speeding up I/O from flash storage, which treats flash purely as storage, will not be enough to fully optimize I/O. Rather, I think that the company will also need to treat flash as an extension of main memory: Intel will need to virtualize (in the old sense of virtual memory) flash memory and treat main memory and flash as if they were on the same top storage tier, with the balancing act between faster-response main memory and slower-response flash taking place "beneath the software covers." Or, to coin another phrase, Intel will need to provide virtual processors handling I/O as part of their DNA. And from there, it is only a short step to handling the basics of data processing in the CPU -- as IBM is already doing via IBM DB2 BLU Acceleration.

Real-Time Flat-Memory Storage Tiering: Redis Labs Discovers Variable Tiering

"Real time" is another of those computer science phrases that I hate to see debased by marketers and techie newbies. In the not-so-good old days, it meant processing and responding to every single (sensor) input in a timely fashion (usually less than a second) no matter what. That, in turn, meant always-up and performance-optimized software aimed specifically at delivering on that "no matter what" guarantee. Today, the term seems to have morphed into one in which basic old-style real-time stream data processing (e.g., keep the car engine running) sits cheek-by-jowl with do-the-best-you-can stream processing involving more complex processing of huge multi-source sensor-data streams (e.g., check if there's a stuck truck around the corner you might possibly bash into). The challenge in the second case (complex processing of huge data streams) is to optimize performance and then prioritize speed-to-accept-data based on the data's "importance".
I must admit to having favorites among vendors based on their technological approach and the likelihood that it will deliver new major benefits to customers, in the long run as well as the short. At this conference, Redis Labs was my clear favorite. Here's my understanding of their approach:
The Redis architecture begins with a database cluster with several variants, allowing users to trade off failover/high availability with performance and maximize main memory and processors in a scale-out environment. Then, however, the Redis Labs solution focuses on "operational" (online real-time update/modify-heavy sensor-driven transaction processing). To do this of course, the Redis Labs database puts data in main memory where possible. Where it is not (according to the presentation), the Redis Labs database treats flash as if it were main memory, mimicking flat-memory access on flash interfaces. As the presenter put it, at times of high numbers of updates, flash is main memory; at other times, it's storage.
The Redis Labs cited numerous benchmarks to show that the resulting database was the fastest kid on the block for "operational" data streams. To me, that's a side effect of a very smart approach to performance optimization that crucially includes the ideas of using flash as if it were main memory and of varying the use of flash as storage, meaning that sometimes all of flash is the traditional tier-2 storage and sometimes all of flash is tier-1 processing territory. And, of course, in situations where main memory and flash are all that is typically needed, for processing and storage, we might as well junk the tiering idea altogether: it's all flat-memory data processing.

memSQL and Update Completion

Redis Labs' approach may or may not complete the optimizations needed for flash-memory operational data processing. At the Summit, memSQL laid out the most compelling approach to the so-called "write-to-storage" issue that I heard.
I first ran into write-to-storage when I was a developer attending meetings of the committee overseeing the next generation of Prime Computer's operating system. As they described it, in those days, there was either main memory storage, which vanished whenever you turned off your PC or your system crashed, and there was everything else (disk and tape, mostly) that kept this information for a long time, whether the system crashed or not. So database data or files or anything that was new or changed didn't really "exist" until it was written to disk or tape. And that meant that further access to that data (modifications or reads) had to wait until the write-to-disk had finished. Not a major problem in the typical case; but where performance needs for operational (or, in those days, online transaction processing/OLTP) data processing required it, write-to-storage was and is a major bottleneck.
memSQL, sensibly enough, takes the "master-slave" approach to addressing this shortcoming. While the master software continues on its merry way with the updated data in main memory, the slave busies itself with writing the data to longer-term storage tiers (including flash used as storage). Problem solved? Not quite.
If the system crashes after a modification has come in but before the slave has finished writing (a less likely occurrence, but still possible), then both the first and second changes are lost. However, in keeping with the have-it-your-way approach of Hadoop, memSQL allows the user to choose a tradeoff between performance speed and what it calls a "high availability" version. And so, flash plus master-slave processing plus a choice of "availability" means that performance is increased in both operational and analytical processing of sensor-type data, the incidence of the write-to-storage problem is decreased, and the user can flexibly choose to accept some data loss to achieve the highest performance, or vice versa.

The Balkanization of Databases: The Users Begin To Speak

Balkanization may be an odd phrase for some readers, so let me add a little pseudo-history here. Starting a bit before 100 BC, the Roman Empire had the entire region from what is now Hungary to northern Greece (the "Balkans") under its thumb. In the late 400s, a series of in-migrations led to a series of new occupiers of the region, and some splintering, but around 1450 the Ottoman Empire again conquered the Balkans. Then, in the 1800s, nationalism arrived, Romantics dug up or created rationales for much smaller nations, and the whole Balkan area irretrievably broke up into small states. Ever since, such a fragmentation has been referred to as the "Balkanization" of a region.
In the case of the new database scenarios, users appear to be carrying out similar carving out of smaller territories via a multitude of databases. One presenter's real-world use case involved, on the analytics side, HP Vertica among others, and several Hadoop-based databases including MongoDB on the operational side. I conclude that, within enterprises and in public clouds, there is a strong trend towards Balkanization of databases.
That is new. Before, the practice was always for organizations to at least try to minimize the number of databases, and for major vendors to try to beat out or acquire each other. Now, I see more of the opposite, because (as memSQL's speaker noted) it makes more sense if one is trying to handle sensor-driven data to go to the Hadoop-based folks for the operational side of these tasks, and to the traditional database vendors for the analytical side. Given that the Hadoop-based side is rapidly evolving technologically and spawning new open-source vendors as a result, it is reasonable to expect users to add more database vendors than they consolidate, at least in the near term. And in the database field, it's often very hard to get rid of a database once installed.

SAP and Oracle Say Columnar Analytics Is "De Rigueur"

Both SAP and Oracle presented at the Summit. Both had a remarkably similar vision of "in-memory computing" that involved primarily columnar relational databases and in-main-memory analytics.
In the case of SAP, that perhaps is not so surprising. SAP HANA marketing has featured its in-main-memory and columnar-relational technologies for some time. Oracle’s positioning is a bit more startling: in the past, its acquisition of TimesTen and its development of columnar technologies had been treated in its marketing as a bit more of a check-list item -- yeah, we have them too, now about our incredibly feature-full, ultra-scalable traditional database ...
Perhaps the most likely answer why both Oracle and SAP were there and talking about columnar was that for flat-memory analytics, columnar's ability to compress data and hence fit it in the main-memory and/or flash tier more frequently trumps traditional row-oriented relational strengths where joins involving less than 3 compressible rows are concerned. Certainly, the use case cited above where HP Vertica's columnar technology was called into service makes the same point.
And yet, the rise in columnar's importance in the new flat-memory systems also reinforces the Balkanization of databases, if subtly. In Oracle's case, it changes the analytical product mix. In SAP's case, it reinforces the value of a relatively new entrant into the database field. In HP's case, it brings in a relatively new database from a relatively new vendor of databases that is likely to be new to the user or relatively disused before. Even within the traditionally non-Balkanized turf of analytical-database vendors some effective Balkanization is beginning to happen, and one of its key driving forces is the usefulness of columnar databases in sensor-driven-data analytics.

A Final Note: IBM and Cisco Are Missing At the Feast But Still Important

Both IBM's DB2 BLU Acceleration and Cisco Data Virtualization, imho, are important technologies in this Brave New World of flat-memory database innovation; but neither was a presenter at the Summit. That may be because the Summit was a bit Silicon-Valley heavy but I don't know for sure. I hope to give a full discussion at some point of the assets these products bring in the new database architectures, but not today. Hopefully, the following brief set of thoughts will give an idea of why I think them important.
In the case of IBM, what is now DB2 BLU Acceleration anticipated and leapfrogged the Summit in several ways, I think. Not only did BLU Acceleration optimize main-memory and to some extent flash memory analytics using the columnar approach; it also optimized the CPU itself. Among several other valuable BLU Acceleration technologies is one that promises to further speed update processing and, hence, operational-plus-analytic columnar processing. The only barrier -- and so far, it has proved surprisingly high -- is to get other vendors interested in these technologies, so that database "frameworks" which offer one set of databases for operational and another for analytical processing can incorporate "intermediate" choices between operational and analytic, or optimize operational processing yet further.
In the case of Cisco, its data virtualization capabilities offer a powerful approach to creating a framework for the new database architecture along the lines of a TP monitor -- and so much more. The Cisco Data Virtualization product is pre-built to optimize analytics and update transactions across scale-out clusters, so is well acquainted with all but the very latest Hadoop-based databases, and has excellent user interfaces. It can also serve as a front end to databases within a slot/"framework", or as a gateway to the entire database architecture. As I once wrote, this is amazing "Swiss army knife" technology -- there's a tool for everything. And for those in Europe, Denodo’s solutions are effective in this case as well.
I am sure that I am leaving out important innovations and potential technologies here. That's how rich the Summit was to a database analyst, and how exciting it should be to users.
So why am I a database analyst, again? I guess I would say, for moments like these.

Thursday, July 9, 2015

In-Memory Computing Summit 1: The Database Game, The Computing Game, Is Changing Significantly

So why was I, a database guy, attending last Monday what turned out to be the first ever In-Memory Computing Summit – a subject that, if anything, would seem to relate more to storage tiering? And why were heavy-hitter companies like Intel, SAP, and Oracle, not to mention TIBCO and Hitachi Data Systems, front and center at this conference? Answer: the surge in flash-memory capacity and price/performance compared to disk, plus the advent of the Internet of Things and the sensor-driven web, is driving a major change in the software we need, both in the area of analytics and operational processing. As one presentation put it, software needs to move from supporting Big Data to enabling Fast (but still Big) Data.

In this series of blog posts, I aim to examine these major changes as laid out in the summit, as well as their implications for databases and computing IT. In the first post, I’d like to sketch out an overall “vision”, and then in later posts explore the details of how the software to support this is beginning to arrive.

The Unit: The “Flat-Memory” System

In an architecture that can record and process massive streams of "sensor" data (including data from mobile phones and from hardware generating information for the Internet of Things) there is a premium on "stream" processing of incoming data in real time, as well as on transactional writes in addition to the reads and queries of analytical processing. The norm for systems handling this load, in the new architecture, is two main tiers: main memory RAM and "flash" or non-volatile memory/NVRAM (approximately 3 orders of magnitude). This may seem like hyperbole when we are talking about Big Data, but in point of fact one summit participant noted a real-world system using 1 TB of main memory and 1 PB of flash.

Fundamentally, flash memory is like main-memory RAM: more or less all addresses in the same tier take an equal amount of time to read or change. In that sense, both tiers of our unitary system are "flat memory", unlike disk, which has spent many years fine-tuning performance that can vary widely depending on the data's position on disk. To ease the first introduction of flash, it provided interfaces to CPUs that mimic disk accesses and therefore make flash's data access both variable and slow (compared to a flat-memory interface). Therefore, for the most part, NVRAM in our unitary system will remove this performance-clogging software and access flash in much the same way that main-memory RAM is accessed today. In fact, as Intel testified at the summit, this process is already underway at the protocol level.

The one remaining variable in performance is the slower speed of flash memory. Therefore, existing in-memory databases and the like will not optimize the new flat-memory systems out of the box. The real challenge will be to identify the amount of flash that needs to be used by the CPU to maximize performance for any given task, and then use the rest for longer-term storage, in much the same way that disk is used now. For the very largest databases, of course, disk will be a second storage tier.

The Architecture: Bridging Operational Fast and Analytical Big

Perhaps memSQL was the presenter at the conference who put the problem most pithily: in their experience, users have been moving from SQL to NoSQL, and now are moving from NoSQL towards SQL. The reason is that for deeper analytical processing of data such as social-media whose value is primarily that it's Big (e.g, much social-media data) use of SQL and relational/columnar databases is better, while for Big Data whose value is primarily that it's fresh (and therefore needs to be processed Fast) SQL software causes unacceptable performance overhead. Users will need both, and therefore will need an architecture that includes Hadoop and SQL/analytical data processing.

One approach would treat each database on either side as a "framework", which would be applied to transactional, analytical, or in-between tasks depending on its fitness for these tasks. That, to me, is a "bridge too far", introducing additional performance overhead, especially at the task assignment stage. Rather, I envision something more akin to a TP monitor, streaming sensor data to a choice among transactional databases (at present, mostly associated with Hadoop), and analytical data to a choice among other analytical databases. I view the focus of presenters such as Redis Labs on the transactional side and SAP and Oracle on the analytical side as an indication that my type of architecture is at least a strong possibility.

The Infrastructure: If This Goes On …

One science fiction author once defined most science fiction as discussing one of three “questions”: What if? If only …, and If this goes on … The infrastructure today for the new units and architecture is clearly “the cloud” – public clouds, private clouds, hybrid clouds. With the steady penetration of Hadoop into enterprises, all of these are now reasonably experienced in supporting both Hadoop and SQL data processing. And yet, if this goes on …
The Internet of Things is not limited to stationary “things”. On the contrary, many of the initial applications involve mobile smartphones and mobile cars and trucks. A recent NPR examination of car technology noted that cars are beginning to communicate not only with the dealer/manufacturer and the driver but also with each other, so that, for example, they can warn of a fender-bender around the next curve. These applications require Fast Data, real-time responses that use the cars’ own databases and flat memory for real-time sensor processing and analytics. As time goes on, these applications should become more and more frequent, and more and more disconnected from today’s clouds. If so, that would mean the advent of the mobile cloud as an alternative and perhaps dominant infrastructure for the new systems and architecture.

Perhaps this will never happen. Perhaps someone has already thought of this. If not, folks: You heard it here first.

Tuesday, June 9, 2015

Lessons of Moore's-Law History 3: The Consumer is the Chasm

In the previous two blog posts, we have seen (hopefully!) the importance of betting on technologies and suppliers that most closely resemble the silicon-transistor-chip evolution process, and the need to bet on suppliers adhering to that process that emphasize infrastructure software combining openness to technology evolution with support for existing systems – but we have not dealt with the “chasms” that occur when Moore’s-Law evolution results in new, radically different higher-level technologies.

Here I am combining two books popular back in the 1990s: Crossing the Chasm, by Geoffrey Moore (no relation), which talked about how innovative startups bridged the gap between early techie adopters and the bulk of the market; and The Innovator’s Dilemma, by Clayton Christensen, which argued that “disruptive innovation” driven by new technologies would over time change existing markets profoundly, usually to the detriment of larger, well-entrenched companies presently dominant in an industry. In an industry like the computer industry (including software and services), where Moore’s Law drives a more rapid pace of underlying improvement to platforms, entrenched companies must often react to disruptive innovation by “crossing the chasm” rapidly with their customers to the new technology, else they risk losing their markets and their ability to survive.

When A Company Is Ultimate-Customer-Blind

The 1990s and early 2000s gave us a good set of data on who succeeded and who failed in “crossing the chasm.” The results were surprising to many. It seemed reasonable that a Sun or an Oracle would surpass a Microsoft, that a Digital Equipment would beat a Novell, that an IBM would dominate a Compaq or a Dell in the PC market. And yet, here we are 25 years later, and by no stretch of the imagination can we say that these things happened.

It seems to me that a common theme of this litany of successes and failures at adapting to new Moore’s-Law technologies is that companies that sold to the business market failed far more often than companies that sold to the consumer market. Sun, which did a superb job jumping from the workstation to the server market, failed among other things to note consumer-driven Linux/Windows open-source software that was undercutting its prices and playing better with the server farms that later led to public clouds. IBM’s initial great Charlie-Chaplin PC marketing fell before the next-day delivery and Intel allegiance of Dell and Compaq. On a smaller scale, we saw Informix’s Oracle-protected VAR channel fail fatally when Informix failed to reach through that channel to detect that the servers it was shipping to VARs were no longer being bought at a comparable rate by the ultimate customers.

On a longer time scale, the proprietary chip sets of IBM, HP, and Sun/Oracle have steadily lost ground compared to Intel markets. It may seem logical that IBM over the last few years is walking away from the PC, starting with its consumer PCs/laptops; but that approach has steadily given it less and less sense of trends in the consumer market, and that, Moore’s-Law history would suggest, is dangerous. It was once assumed that one’s employees would buy consumer versions of what a business-focused computer company offered; but, starting in the 1980s, the opposite has appeared true: Word/Lotus, Excel, presentation software, laptops, smartphones, and social media have all been imported into businesses before and despite corporate standards. If a computer company wishes to handle a disruptive innovation, Moore’s-Law history suggests, it must immediately sense the consumer movement, and then follow the consumer across the chasm, and quickly.

The Top Line is Now the Bottom Line

From 2008 on until fairly recently, a focus on cutting costs allowed companies like IBM, Oracle, and HP to look good to Wall Street although revenues were slightly down to slightly up. Now, however, the contrast with consumer-focused companies like Google, Amazon, and Apple that continue to grow revenues by leaps and bounds is becoming all too obvious, and the chasm to cross to support the new mobile and cloud technologies is becoming wider by the day. Generalizing to other industries, it is not enough to keep in sync with a Moore’s-Law-related process and focus on flexible, forward-compatible software. One should also focus on growing top-line revenues by crossing the chasm with the customer.

That means aligning platforms with consumer-successful companies like Intel. That means aligning strategies with Google's, Amazon's, and Apple’s smartphone and cloud technology changes. And that means far better coordination with and analytics about ultimate end-user customers – e.g., agile marketing.

Moore’s Law won’t last forever – it probably won’t last more than 5-10 years longer. But the lessons of its history will last much longer than that. Caveat seller.

Thursday, May 28, 2015

Lessons of Moore's Law History 2: It's the Software

In my first post about the lessons of Moore’s Law, I talked about how the silicon-transistor-chip process drove long-term cost/unit decreases and crowded out competitor technologies – but I didn’t talk about winners and losers who were all using a similar Moore’s Law process. Why did some prosper in the long run and some fall by the wayside? The answer, again, I believe, has important implications for computer-industry and related-industry strategies today.

Low Barriers to Entry

In 1981, the trade and political press was abuzz with the onset of the Japanese, who, it seemed would soon surpass the US in GDP and dominate manufacturing – which was, in that era’s mindset, the key to dominated the world economy. In particular, the Japanese seemed set to dominate computing, because they had taken the cost lead in the Moore’s-Law silicon-transistor chip manufacturing process – that is, in producing the latest and greatest computer chips (we were then approaching 4K bits per chip, iirc). I well remember a conversation with Prof. Lester Thurow of MIT (my then teacher of econometrics at Sloan School of Management), at the time the economist guru of national competitiveness, in which he confidently asserted that the way was clear for the Japanese to dominate the world economy. I diffidently suggested that they would be unable to dominate in software, and in particular in microprocessors supporting that software, because of language barriers and difficulties in recreating the US application development culture, and therefore would not dominate. Thurow brushed off the suggestion: Software, he said, was a “rounding error” in the US economy.
There are two points hidden in this particular anecdote: first, that there are in fact low barriers to entry in a silicon-transistor-chip process-driven industry such as memory chips, and second, that the barriers to entry are much higher when software is a key driver of success. In fact, that is one key reason why Intel succeeded while the Mosteks and TIs of the initial industry faded from view: where many other US competitors lost ground as first the Japanese, then the Koreans, and finally the Taiwanese came to dominate memory chips, Intel deliberately walked away from memory chips in order to focus on evolving microprocessors according to Moore’s Law and supporting the increasing amount of software built on top of Intel processor chips. And it worked.

Upwardly Mobile

It worked, imho, primarily because of an effective Intel strategy of "forward [or upward] compatibility." That is, Intel committed unusually strongly to the notion that as component compression, processor features, and processor speed increased in the next generation according to Moore's Law, software written to use today's generation would still run, and at approximately the same speed as today or faster. In effect, as far as software was concerned, upgrading one's systems was simply a matter of swap the new generation in, swap the old one out -- no recompilation needed. And recompilation of massive amounts of software is a Big Deal.
Push came to shove in the late 1980s. By that time, it was becoming increasingly clear that the Japanese were not going to take over the booming PC market, although some commentators insisted that Japanese "software factories" would be superior competitors to the American software development culture. The real tussle was for the microprocessor market, and in the late 1980s Intel's main competitor was Motorola. In contrast to Intel, Motorola emphasized quality -- similar to the Deming precepts that business gurus of the time insisted were the key to competitive success. Motorola chips would come out somewhat later than Intel's, but would be faster and a bit more reliable. And then the time came to switch from a 16-bit to a 32-bit instruction word -- and Motorola lost massive amounts of business.
Because of its focus on upward compatibility, Intel chose to come out with a version that sacrificed some speed and density. Because of its focus on quality, Motorola chose to require use of some 32-bit instructions that would improve speed and fix some software-reliability problems in its 16-bit version. By this time, there was a large amount of software using Motorola's chip set. When users saw the amount of recompilation that would be required, they started putting all their new software on the Intel platform. From then on, Intel's main competition would be the "proprietary" chips of the large computer companies like IBM and HP (which, for technical reasons, never challenged Intel's dominance of the Windows market), and a "me-too" company called Advanced Micro Devices (AMD).
The story is not quite over. Over the next 20 years, AMD pestered Intel by adopting a "lighter-weight" approach that emulated Intel's processors (so most if not all Intel-based apps could run on AMD chips) but used a RISC-type instruction set for higher performance. As long as Intel kept to its other famous saying ("only the paranoid survive") it was always ready to fend off the latest AMD innovation with new capabilities of its own (e.g., embedding graphics processing in the processor hardware). However, at one point it started to violate its own tenet of software support via upward compatibility: It started focusing on the Itanium chip.
Now, I believed then and I believe now that the idea embedded in the Pentium chip -- "superscalar" computing, or instruction-word design that did not insist that all words be the same length -- is a very good one, especially when we start talking about 64-bit words. True, there is extra complexity in processing instructions, but great efficiency and therefore speed advantages in not insisting on wasted space in many instruction words. But that was not what croaked Pentium. Rather, Pentium was going to require new compilers and recompilation of existing software. As far as I could see at the time, both Intel and HP did not fully comprehend the kind of effort that was going to be needed to minimize recompilation and optimize the new compilers. As a result, the initial prospect faced by customers was of extensive long-term effort to convert their existing code -- and most balked. Luckily, Intel had not completely taken its eye off the ball, and its alternative forward-compatible line, after some delays that allowed AMD to make inroads, moved smoothly from 32-bit to 64-bit computing.

History Lesson 2

By and large, the computer industry has now learned the lesson of the importance of upward/forward compatibility. In a larger sense, however, the Internet of Things raises again the general question: Whose platform and whose “culture” (enthusiastic developers outside the company) should we bet on? Is it Apple’s or Google’s or Amazon’s (open-sourcing on the cloud) or even Microsoft’s (commonality across form factors)? And the answer, Moore’s Law history suggests, is to bet on the platforms that will cause the least disruption to existing systems and still support the innovations of the culture.
I heard an interesting report on NPR the other day about the resurgence of paper. Students, who need to note down content that rarely can be summarized in a tweet or on a smart phone screen, are buying notebooks in droves, not as a substitute but as a complement to their (typically Apple) gear. In effect, they are voting that Apple form factors, even laptops, are sometimes too small to handle individual needs for substantial amounts of information at one’s fingertips. For those strategists who think Apple is the answer to everything, this is a clear warning signal. Just as analytics is finding that using multiple clouds, not just Amazon’s, is the way to go, it is time to investigate a multiple-platform strategy, whether it be in car electronics and websites, or in cloud-based multi-tenant logistics functionality.
And yet, even smart software-focused platform strategies do not protect the enterprise from every eventuality. There is still, even within Moore’s-Law platforms, the periodic disruptive technology. And here, too, Moore’s Law history has something to teach us – in my next blog post.

Friday, May 22, 2015

Lessons of Moore's Law History 1: It's the Process

At Aberdeen Group in the 1990s, I learned one fundamental writing technique: Get to the point, and don’t waste time with history. However, the history of Moore’s Law is the exception that proves (meaning, “tests and thus clarifies”) the rule – its history provides lessons that should guide our computer-industry strategies even today. As I hope to prove in this series of blog posts.

A Wealth of Technology Options

In 1976, when I started by programming the Intel 8080 microprocessor at NCR Corp., Electronics Magazine and EE Times were filled with news and ads about upcoming developments in many alternative methods of storage. Not only were disk and tape, today’s survivors, quite different approaches from transistor-based storage and processors; within and outside of transistors, there were quite a few alternatives that are worth remembering:

· Magnetic drum;

· CCDs (charge-coupled devices);

· Magnetic bubble memory;

· ECL;

· Silicon on sapphire;

· Josephson junctions.

Within 10 years, most of these had practically vanished from the bulk of computing, while ECL hung on for another 5-10 years in some minicomputer and mainframe shops. So why did the silicon-based architecture that Moore is really talking about (the one used by triumphant competitor Intel) crowd all of these out? To say that it was because silicon plus the transistor was a better technology is pure hindsight.

In my ever so humble opinion, the reason was the superior ability of silicon transistors to allow:

1. Learning-curve pricing;

2. Generation-leapfrogging design; and

3. Limitation of testing and changes.

Together, these form an integrated long-term process that allows effective achievement of Moore’s Law, generation after generation, by multiple companies.

Let’s take these one at a time.

Learning-Curve Pricing

Few now remember how shocking it was when chip companies started pricing their wares below cost – not to mention making handsome profits out of it. The reason was that the silicon-transistor-chip fabrication process turned out to benefit from a “learning curve”: as designers, fab-facility execs, and workers learned and adjusted, the cost of producing each chip dipped, sometimes by as much as an eventual 50%. By pricing below initial cost, chip companies could crowd out the ECL and SoS folks who were later to achieve the same chip size but had faster chips; and by the time ECL design could catch up, why, the next generation of silicon-transistor chips was ahead again. Note that not only does Moore’s Law imply greater storage in the same size of chip; it implies the cost per storage unit goes down 50-100% per year; and learning-curve pricing is a strong impetus for that kind of rapid cost decrease.

Generation-Leapfrogging Design

A second distinguishing feature of the silicon-transistor-chip process is the ability to start planning for generation 3 once generation 1 comes out, or for generation 4 once generation 2 comes out. Competitors would far more frequently be unable to anticipate the kinds of challenges related to shrinking storage size two generations ahead. By contrast, silicon-transistor-chip companies then and now create whole “leapfrog teams” of designers that handle alternate releases.

The luxury of tackling problems 4 years ahead of time means a far greater chance of extending Moore’s Law as shrinkage problems such as channel overlap and overheating become more common. It therefore tends to widen the difference between competing technologies such as ECL (not to mention IBM Labs’ old favorite, Josephson junctions) and silicon, because each silicon generation is introduced faster than the competitor technology’s. And this ability to "leapfrog design" is greatly improved by our third “superior ability”.

Limitation of Testing and Changes

Afaik, the basic process of creating a silicon chip has not changed much in the last 55 years. On top of a silicon "substrate", one lays a second layer that has grooves in it corresponding to transistors and the connections between them. A "mask" blocks silicon from being laid down where a groove is supposed to be. Attach metal to the chip edges to allow electric input and output, and voila! A main-memory chip, an SSD chip, or a processor chip.

Note what this means for the next generation of chips. To speed up the next generation of chips, and increase the amount stored on a same-sized chip at about the same cost, one simply designs a mask that shrinks the distance between channels and/or shortens the distance between transistors on the same channel.

Contrast this with, say, developing a faster, more fuel-efficient car for the same price. Much more of the effort in the car's design goes into researching relatively new technologies -- aerodynamics, fuel injection, etc. As a result, each generation is very likely to achieve less improvement compared to the silicon chip and requires more changes to the car. Moreover, each generation requires more testing compared to the silicon chip, not only because the technologies are relatively new, but also because the basic architecture of the car and the basic process used to produce it may well change more. Hence the old joke, "If a car was like a computer, it would run a million miles on a tank of gas and cost a dollar by now." (There's an additional line by the car folks that says, "and would break down five days a year and cost millions to fix", but that's no longer a good joke, now that cars have so many computer chips in them).

Of course, most next-generation design processes fall in between the extremes of the silicon chip and the car, in terms of the testing and changes they require. However, fewer changes and less testing per unit of improvement in the silicon chip gave it another edge in speed-to-market compared to competing technologies -- and the same thing seems to be happening now with regard to SSDs (transistor technology) vs. disk (magnetism-based technology).

Applying Lesson 1 Today

The first and most obvious place to apply insights into the crucial role of long-term processes in driving cost/unit according to Moore's Law is in the computer industry and related industries. Thus, for example, we may use it to judge the likelihood of technologies like memristors to supersede silicon. It appears that the next 5 years will see a slowed but continuing improvement in speed/size unit and cost/unit in the silicon-transistor sphere, at least to the point of 10-nm fab facilities. Thus, these technologies will have the handicaps of unfamiliar technology and will be chasing a rapidly-moving target. Hence, despite their advantages on a "level playing field", these technologies seem unlikely to succeed in the next half-decade. Likewise, we might expect SSDs to make further inroads into disk markets, as disk technology improvements have been slowing more rapidly than silicon-processor improvements, at least recently.

Another area might be energy technology. I guessed back in the 1980s, based on my understanding of the similarity of the solar and transistor processes, that silicon solar-energy wafers would be on a fairly rapid downward cost/unit track, while oil had no such process; and so it seems to have proved. In fact, solar is now price-competitive despite its stupendous disadvantage in "sunk-cost" infrastructure, making success almost certainly a matter of when rather than if. Moreover, now that better batteries from the likes of Tesla are arriving, oil's advantage of storage for variable demand is significantly lessened (although battery technology does not appear to follow Moore's Law).

Above all, understanding of the role of the process in the success of silicon-transistor-chip technology can allow us to assess as-yet-unannounced new technologies with appropriate skepticism. So the next time someone says quantum computing is coming Real Soon Now, the appropriate response is probably, to misquote St. Augustine, "By all means, give me quantum computing when it's competitive -- but not just yet."