Monday, June 29, 2009

The Dangers of Rewriting History: OS/2 and Cell Phones

Recently, Paul Krugman has been commenting on what he sees as the vanishing knowledge of key concepts such as Say’s Law in the economics profession, partly because it has been in the interest of a particular political faction that the history of the Depression be rewritten in order to bolster their cause. The danger of such a rewriting, according to Krugman, is that it saps the will of the US to take the necessary steps to handle another very serious recession. This has caused me to ask myself, are there corresponding dangerous rewritings of history in the computer industry?

I think there are. The outstanding example, for me, is the way my memory of what happened to OS/2 differs from that of others that I have spoken to recently.

Here’s my story of what happened to OS/2. In the late 1980s, Microsoft and IBM banded together to create a successor to DOS, then the dominant operating system in the fastest-growing computer-industry market. The main reason was users’ increasing interest in the Apple’s GUI-based rival operating system. In time, the details of OS/2 were duly released.

Now, there were two interesting things about OS/2, as I found out when researching it as a programmer at Prime Computer. First, there were a large stack of APIs for various purposes, requiring many large manuals of documentation. Second, OS/2 also served as the basis for a network operating system (NOS) called LAN Manager (Microsoft’s product). So if you wanted to implement a NOS involving OS/2 PCs, you had to implement LAN Manager. But, iirc, LAN Manager required 64K of RAM memory in the client PC – and PCs were still 1-2 years from supporting 64K of RAM.

This reason this mattered is that, as I learned from talking to Prime sales folk, NOSs were in the process of shattering the low-end offerings of major computer makers. The boast of Novell at that time was that, using a basic PC as the server, it could deliver shared data and applications to any client PC faster than that PC’s own disk. So a NOS full of cheap PCs was just the thing for any doctor’s office, retail store, or other department/workgroup, much cheaper than a mini from Prime, Data General, Wang, or even IBM – and it could be composed of the PCs that members of the workgroup had already acquired for other purposes.

In turn, this meant that the market for PCs was really a dual consumer/business market involving PC LANs, in which home computers were used interchangeably with office ones. So all those applications that the PC LANs supported would have to run on DOS PCs with something like Novell NetWare, because OS/2 PCs required LAN Manager, which would not be usable for another 2 years … you get the idea. And so did the programmers of new applications, who, when they waded through the OS/2 documentation, found no clear path to a big enough market for OS/2-based apps.

So here was Microsoft, watching carefully as the bulk of DOS programmers held off on OS/2, and Apple gave Microsoft room to move by insisting on full control of their GUI’s APIs, shutting out app programmers. And in a while, there was the first version of Windows. It was not as powerful as OS/2, nor was it backed by IBM. But it supported DOS, it allowed any NOS but LAN Manager, and the app programmers went for it in droves. And OS/2 was toast.

Toast, also, were the minicomputer makers, and, eventually, many of the old mainframe companies in the BUNCH (Burroughs, Univac, NCR, Control Data, Honeywell). Toast was Apple’s hope of dominating the PC market. The sidelining of OS/2 was part of the ascendance of PC client-server networks, not just PCs, as the foundation of server farms and architectures that were applied in businesses of all scales.

What I find, talking to folks about that time, is that there seem to be two versions, different from mine, about what really happened at that time. The first I call “evil Microsoft” or “it’s all about the PC”. A good example of this version is Wikipedia’s entry on OS/2. This glosses over the period between 1988, when OS/2 was released, and 1990, when Windows was released, in order to say that (a) Windows was cheaper and supported more of what people wanted than OS/2, and (b) Microsoft arranged that it be bundled on most new PCs, ensuring its success. In this version, Microsoft seduced consumers and businesses by creating a de-facto standard, deceiving businesses in particular into thinking that the PC was superior to (the dumb terminal, Unix, Linux, the mainframe, the workstation, network computers, open source, the cell phone, and so on). And all attempts to knock the PC off its perch since OS/2 are recast as noble endeavors thwarted by evil protectionist moves by monopolist Microsoft, instead of failures to provide a good alternative that supports users’ tasks both at home and at work via a standalone and networkable platform.

The danger of this first version, imho, is that we continue to ignore the need of the average user to have control over his or her work. Passing pictures via cell phone and social networking via the Internet are not just networking operations; the user also wants to set aside his or her own data, and work on it on his or her own machine. Using “diskless” network computers at work or setting too stringent security-based limits on what can be brought home simply means that employees get around those limits, often by using their own laptops. By pretending that “evil Microsoft” has caused “the triumph of the PC”, purveyors of the first version can make us ignore that users want both effective networking to take advantage of what’s out there and full personal computing, one and inseparable.

The second version I label “it’s the marketing, not the technology.” This was put to me in its starkest form by one of my previous bosses: it didn’t matter that LAN Manager wouldn’t run on a PC, because what really killed OS/2, and kills every computer company that fails, was bad marketing of the product (a variant, by the way, is to say that it was all about the personalities: Bill Gates, Steve Ballmer, Steve Jobs, IBM). According to this version, Gates was a smart enough marketer to switch to Windows; IBM were dumb enough at marketing that they hung on to OS/2. Likewise, the minicomputer makers died because they went after IBM on the high end (a marketing move), not because PC LANs undercut them on the low end (a technology against which any marketing strategy probably would have been ineffective).

The reason I find this attitude pernicious is that I believe it has led to a serious dumbing down of computer-industry analysis and marketing in general. Neglect of technology limitations in analysis and marketing has led to devaluation of technical expertise in both analysts and marketers. For example, I am hard-pressed to find more than a few analysts with graduate degrees in computer science and/or a range of experience in software design that give them a fundamental understanding of the role of the technology in a wide array of products – I might include Richard Winter and Jonathan Eunice, among others, in the group of well-grounded commentators. It’s not that other analysts and marketers don’t have important insights to contribute, whether they’re from IT, journalism, or generic marketing backgrounds; it is that the additional insights of those who understand what technologies underlie an application are systematically devalued as “just like any other analyst,” when those insights can indeed do a better job of assessing a product and its likelihood of success/usefulness.

Example: does anyone remember Parallan? In the early ‘90s, they were a startup betting on OS/2 LAN Manager. I was working at Yankee Group, which shared the same boss and location as a venture capital firm called Battery Ventures. Battery Ventures invested in Parallan. No one asked me about it; I could have told them about the technical problems with LAN Manager. Instead, the person who made the investment came up to me later and filled my ears with laments about how bad luck in the market had deep-sixed his investment.

The latest manifestation of this rewriting of history is the demand that analysts be highly visible, so that there’s a connection between what they say and customer sales. Visibility is about the cult of personality – many of the folks who presently affect customer sales, from my viewpoint, often fail to appreciate the role of the technology that comes from outside of their areas of expertise, or view the product almost exclusively in terms of marketing. Kudos, by the way, to analysts like Charles King, who recognize the need to bring in technical considerations in Pund-IT Review from less-visible analysts like Dave Hill. Anyway, the result of dumbing-down by the cult of visibility is less respect for analysts (and marketers), loss of infrastructure-software “context” when assessing products on the vendor and user side, and increased danger of the kind of poor technology choices that led to the demise of OS/2.

So, as we all celebrate the advent of cell phones as the successor to the PC, and hail the coming of cloud computing as the best way to save money, please ignore the small voice in the corner that says that the limitations of the technology of putting apps on the cell phone matter, and that cloud computing may cause difficulties with individual employees passing data between home and work. Oh, and be sure to blame the analyst or marketer for any failures, so the small voice in the corner will become even fainter, and history can successfully continue to be rewritten.

Storage/Database Tuning: Whither Queuing Theory?

I was listening in on a discussion of a recent TPC-H benchmark by Sun (hardware) and its ParAccell columnar/in-memory-technology database (cf recent blog posts by Merv Adrian and Curt Monash), when a benchmarker dropped an interesting comment. It seems that ParAccell used 900-odd TB of storage to store 30 TB of data, not because of inefficient storage or to “game” the benchmark, but because disks are now so large that in order to gain the performance benefits of streaming from multiple spindles into main memory, ParAccell had to use that amount of storage to allow parallel data streaming from disks to main memory. Thus, if I understand what the benchmarker said, in order to maximize performance, ParAccell had to use 900-odd 1-terabyte disks simultaneously.

What I find interesting about that comment is the indication that queuing theory still means something when it comes to database performance. According to what I was taught back in 1979, I/Os pile up in a queue when the number of requests is greater than the number of disks, and so at peak load, 20 500-MB disks can deliver a lot better performance than 10 1-GB disks – although they tend to cost a bit more. The last time I looked, at list price 15 TB of 750-GB SATA drives cost $34,560, or 25% more than 15 TB of 1-TB SATA drives.

The commenter then went on to note that, in his opinion, solid-state disk would soon make this kind of maneuver passé. I think what he’s getting at is that solid-state disk should be able to provide parallel streaming from within the “disk array”, without the need to go to multiple “drives”. This is because solid-state disk is main memory imitating disk: that is, the usual parallel stream of data from memory to processor is constrained to look like a sequential stream of data from disk to main memory. But since this is all a pretence, there is no reason that you can’t have multiple disk-memory “streams” in the same SSD, effectively splitting it into 2, 3, or more “virtual disks” (in the virtual-memory sense). It’s just that SSDs were so small in the old days, there didn’t seem to be any reason to bother.

To me, the fact that someone would consider using 900 TB of storage to achieve better performance for 30 TB of data is an indication that (a) the TPC-H benchmark is too small to reflect some of the user data-processing needs of today, and (b) memory size is reaching the point at which many of these needs can be met just with main memory. A storage study I have been doing recently suggests that even midsized firms now have total storage needs in excess of 30 TB, and in the case of medium-sized hospitals (with video-camera and MRI/CAT scan data) 700 TB or more.

To slice it finer: structured-data database sizes may be growing, but not as fast as memory sizes, so many of these (old-style OLTP) can now be done via main memory and (as a stopgap for old-style programs) SSD. Unstructured/mixed databases, as in the hospital example, still require regular disk, but now take up so much storage that it is still possible to apply queuing theory to them by streaming I/O in parallel from data striped on 100s of disks. Data warehouses fall somewhere in between: mostly structured, but still potentially too big for memory/SSD. But data warehouses don’t exist in a vacuum: the data warehouse is typically physically in the same location as unstructured/mixed data stores. By combining data warehouse and unstructured-data storage and striping across disks, you can improve performance and still use up most of your disk storage – so queuing theory still pays off.

How about the next three years? Well, we know storage size is continuing to grow, perhaps at 40-50%, despite the re cession, as regulations about email and video data retention continue to push the unstructured-data “pig” through the enterprise’s data-processing “python.” We also know that Moore’s Law may be beginning to break down, so that memory size may be on a slower growth curve. And we know that the need for real-time analysis is forcing data warehouses to extend their scope to updatable data and constant incremental OLTP feeds, and to relinquish a bit of their attempt to store all key data (instead, allowing in-situ querying across the data warehouse and OLTP).

So if I had to guess, I would say that queuing theory will continue to matter in data warehousing, and that fact should be reflected in any new or improved benchmark. However, SSDs will indeed begin to impact some high-end data-warehousing databases, and performance-tuning via striping will become less important in those circumstances – that also should be reflected in benchmarks. However, it is plain that in such a time of transition, benchmarks such as TPC-H cannot fully and immediately reflect each shift in the boundary between SSD and disk. Caveat emptor: users should begin to make finer-grained decisions about which applications belong with what kind of storage tiering.

Friday, June 12, 2009

Microsoft's LiveCam: The Value of Narcissism

Yesterday, I participated in Microsoft’s grand experiment in a “virtual summit”, by installing Microsoft LiveCam on my PC at home and then doing three briefings by videoconferencing (two user briefings lacked video, and the keynote required audio via phone). The success rate wasn’t high; in two of the three briefings, we never did succeed in getting both sides to view video, and in one briefing, the audio kept fading in and out. From some of the comments on Twitter, many of my fellow analysts were unimpressed by their experiences.

However, in the one briefing that worked, I found there was a different “feel” to the briefing. Trying to isolate the source of that “feel” – after all, I’ve seen jerky 15-fps videos on my PC before, and video presentations with audio interaction – I realized that there was one aspect to it that was unique: not only did I (and the other side) see each other; we also saw ourselves. And that’s one possibility of videoconferencing that I’ve never seen commented on (although see http://www.editlib.org/p/28537).

The vendor-analyst interaction, after all, is an alternation of statements meant to convince: the vendor, about the value of the solution; the analyst, about the value of the analysis. Each of those speaker statements is “set up” immediately previously by the speaker acting as listener. Or, to put it very broadly, in this type of interaction a good listener makes a good convincer.

So the key value of a videoconference of this type is that instant feedback about how one is coming across as both a listener and speaker is of immense value. With peripheral vision the speaker can adjust his or her style so he/she appears more convincing to himself/herself; and the listener can adjust his or her style so as to emphasize interest in the points that he/she will use as a springboard to convince in his/her next turn as speaker. This is something I’ve found to work in violin practice as well: it allows the user to move quickly to playing with the technique and expression that one is aiming to employ.

So, by all means, criticize the way the system works intermittently and isn’t flexible enough to handle all “virtual summit” situations, the difficulties in getting it to work, and the lack of face-to-face richer information-passing. But I have to tell you, if all of the summit had been like that one brief 20 minutes where everything worked and both sides could see the way they came across, I would actually prefer that to face-to-face meetings.

“O wad some God the giftie gie us,” said my ancestors’ countryman, Scotsman Robbie Burns, “To see ourselves as others see us.” The implication, most have assumed, is that we would be ashamed of our behavior. But with something like Microsoft’s LiveCam, I think the implication is that we would immediately change our behavior so we liked what we saw; and would be the better for our narcissism.

Monday, June 8, 2009

Intel Acquires Wind River: the Grid Marries the Web?

Thinking about Intel’s announcement on Friday that it will acquire Wind River Systems, it occurs to me that this move syncs up nicely with a trend that I feel is beginning to surface: a global network of sensors of various types (call it the Grid) to complement the Web. But the connection isn’t obvious; so let me explain.

The press release from Intel emphasized Wind River’s embedded-software development and testing tools. Those are only a part of its product portfolio – its main claim to fame over the last two decades has been its proprietary real-time operating system/RTOS, VxWorks (it also has a Linux OS with real-time options). So Intel is buying not only software for development of products such as cars and airplanes that have software in them; it is buying software to support applications that must respond to large numbers of inputs (typically from sensors) in a fixed amount of time, or else catastrophe ensues. Example: a system keeps track of temperatures in a greenhouse, with ways to seal off breaches automatically; if the application fails to respond to a breach in seconds, the plants die.

Originally, in the early development of standardized Unix, RTOSs were valued for their robustness; after all, not only do they have to respond in a fixed time, but they also have to make sure that no software becomes unavailable. However, once Open Software Foundation and the like had added enough robustness to Unix, RTOSs became a side-current in the overall trend of computer technology, of no real use to the preponderance of computing. So why should RTOSs matter now?

What Is the Grid?
Today’s major computing vendors, IBM among the foremost, are publicizing efforts to create the Smart Grid, software added to the electrical-power “grid” in the United States that will allow users to monitor and adapt their electricity usage to minimize power consumption and cost. This is not to be confused with grid computing, which created a “one computer” veneer over disparate, distributed systems, typically to handle one type of processing. The new Smart Grid marries software to sensors and a network, with the primary task being effective response to a varying workload of a large number of sensor inputs.

But this is not the only example of global, immediate sensor-input usage – GPS-based navigation is another. And this is not the only example of massive amounts of sensor data – RFID, despite being slow to arrive, now handles RFID-reader inputs by the millions.

What’s more, it is possible to view many other interactions as following the same global, distributed model. Videos and pictures from cell phones at major news events can, in effect, be used as sensors. Inputs from sensors at auto repair shops can not only be fed into testing machines; they can be fed into global-company databases for repair optimization. The TV show CSI has popularized the notion that casino or hospital video can be archived and mined for insights into crimes and hospital procedures, respectively.

Therefore, it appears that we are trending towards a global internetwork of consumer and company sensor inputs and input usage. That global internetwork is what I am calling the Grid. And RTOSes begin to matter in the Grid, because an RTOS such as VxWorks offer a model for the computing foundations of the Grid.

The Grid, the Web, and the RTOS
The model for the Grid is fundamentally different from that of the Web (which is not to say that the two cannot be merged). It is, in fact, much more like that of an RTOS. The emphasis in the Web is of flexible access to existing information, via searches, URLs, and the like. The emphasis in the Grid is on rapid processing of massive amounts of distributed sensor input, and only when that requirement has been satisfied does the Grid turn its attention to making the resulting information available globally and flexibly.

This difference, in turn, can drive differences in computer architecture and operating software. The typical server, PC, laptop, or smartphone assumes that it the user has some predictable control over the initiation and scheduling of processes – with the exception of networking. Sensor-based computing is much more reactive: it is a bit like having one’s word processing continually interrupted by messages that “a new email has arrived”. Sensors must be added; ways must be found to improve the input prioritization and scheduling tasks of operating software; new networking standards may need to be hardwired to allow parallel handling of a wide variety of sensor-type inputs plus the traditional Web feeds.

In other words, this is not just about improving the embedded-software development of large enterprises; this is about creating new computing approaches that may involve major elaborations of today’s hardware. And of today’s available technologies, the RTOS is among the most experienced and successful in this type of processing.

Where Intel and Wind River Fit
Certainly, software-infused products that use Intel chips and embedded software are a major use case of Intel hardware. And certainly, Wind River has a market beyond sensor-based real-time processing, in development of embedded software that does not involve sensors, such as networking software and cell-phone displays. So it is reasonable for Intel to use Wind River development and testing tools to expand into New Product Development for software-infused products like audio systems; and it is reasonable for commentators to wonder if such a move trespasses on the territory of vendors such as IBM, which has recently been making a big push in software-infused NPD.

What I am suggesting, however, is that in the long run, Wind River’s main usefulness to Intel may be in the reverse direction: providing models for implementing previously software-based sensor-handling in computing hardware. Just as many formerly software-only graphics functions have moved into graphics chips with resulting improvements in the gaming experience and videoconferencing, so it can be anticipated that moving sensor-handling functions into hardware can make significant improvements in users’ experience of the Grid.

Conclusions
If it is indeed true that a greater emphasis on sensor-based computing is arriving, how much effect does this trend have on IT? In the short run, not much. The likely effect of Intel’s acquisition of Wind River over the next year, for example, will be on users’ embedded software development, and not on providing new avenues to the Grid.

In the long run, I would anticipate that the first Grid effects from better Intel (or other) solutions would show up in an IT task like power monitoring in data centers. Imagine a standardized chip for handling distributed power sensing and local input processing across a data center, wedded to today’s power-monitoring administrative software. Extended globally across the enterprise, supplemented by data-mining tools, used to provide up-to-date data to regulatory agencies, extended to clouds to allow real-time workload shifting, supplemented by event-processing software for feeding corporate dashboards, extended to interactions with the power company for better energy rates, made visible to customers of the computing utility as part of the Smart Grid – there is a natural pathway from sensor hardware in one machine to a full Grid implementation.

And it need not take Intel beyond its processor-chip comfort zone at all.