Monday, November 29, 2010

Another Requiem: Attachmate Acquires Novell

And so, another founder of the computing industry as we know it today officially bites the dust. A few days ago, Attachmate announced that it was acquiring Novell – and the biggest of the PC LAN companies will be no more.

I have more fond memories of Novell than I do of Progress, Sun, or any of the other companies that have seen their luster fade over the last decade. Maybe it was the original facility in Provo, with its Star Trek curving corridors and Moby Jack as haute cuisine, just down the mountain from Robert Redford’s Sundance. Maybe it was the way that when they sent us a copy of NetWare, they wrapped it in peanuts instead of bubble wrap, giving us a great wiring party. Or maybe it was Ray Noorda himself, with his nicknames (Pearly Gates and Ballmer the Embalmer) and his insights (I give him credit for the notion of coopetition).

But if Novell were just quirky memories, it wouldn’t be worth the reminiscence. I firmly believe that Novell, more than any other company, ushered out the era of IBM and the Seven Dwarves, and ushered in the world of the PC and the Internet.

Everyone has his or her version of those days. I was at Prime at the time, and there was a hot competition going on between IBM at the high end and DEC, Wang, Data General, and Prime at the “low end”. Even with the advent of the PC, it looked as if IBM or DEC might dominate the new form factor; Compaq was not a real competitor until the late 1980s.

And then along came the PC LAN companies: 3Com, Banyan, Novell. While IBM and the rest focused on high end sales, and Sun and Apollo locked up workstations, the minicomputer makers’ low ends were being stealthily undercut by PC LANs, and especially from the likes of Novell. The logic was simple: the local dentist, realtor, or retailer bought a PC for personal use, brought it to the business, and then realized that it was child’s play – and less than $1K – to buy LAN software to hook the PCs in the office together. It meant incredibly cheap scalability, and when I was at Prime it gutted the low end of our business, squeezing the mini makers from above (IBM) and below (Novell).

There was never a time when Novell could breathe easily. At first, there were Banyan and 3Com; later, the mini makers tried their hand at PC LANs; then came the Microsoft partnership with IBM to push OS/2 LAN Manager; and finally, in the early 1990s, Microsoft took dead aim at Novell, and finally managed to knock them off their perch. However, until the end, NetWare had two simple ideas to differentiate it, well executed by the “Magic 8” (the programmers doing fundamental NetWare design, including above all Drew Major): the idea that to every client PC, the NetWare file system should look like just another drive, and the idea that frequently accessed files should be stored in main memory on the server PC, so that, as Novell boasted, you could get a file faster from NetWare than you could from your own PC’s hard drive.

Until the mid 1990s, analysts embarrassed themselves by predicting rapid loss of market share to the latest competitor. Every year, surveys showed that purchasing managers were planning to replace their NetWare with LAN Server, with LAN Manager, with VINES; and at the end of the year, the surveys would show that NetWare had increased its hold, with market share in the high 70s. Why? Because what drove the market was purchases below the purchasing manager’s radar screen (less than the $10K that departments were supposed to report upstairs). One DEC employee told me an illustrative story: while DEC was trying to mandate in-house purchase of its PC LAN software, the techies at DEC were expanding their use of NetWare by leaps and bounds, avoiding official notice by “tunneling” NetWare communications as part of the regular DEC network. The powers that be finally noticed what was going on because the tunneled communications became the bulk of all communications across the DEC network.

In the early 1990s, Microsoft finally figured out what to do about this. Shortly after casting off OS/2 and LAN Manager, Microsoft developed its own, even more basic, PC LAN software that at first simply allowed sharing across a couple of “peer” PCs. Using this as a beachhead, Microsoft steadily developed Windows’ LAN capabilities, entirely wrapped in the Windows PC OS, so that it cost practically nothing to buy both the PC and the LAN. This placed Novell in an untenable position, because what was now driving the market was applications developed on top of the PC and LAN OS, and NetWare had never paid sufficient attention to LAN application development; it was easy for Microsoft to turn Windows apps into Windows plus LAN apps, while it was very hard for Novell to do so.

Nevertheless, Novell’s core market made do with third-party Windows apps that could also run on NetWare, until the final phase of the tragedy: Windows 2000. You see, PC LANs always had the limitation that they were local. The only way that PC LAN OSs could overcome the limitations of geography was to provide real-time updates to resource and user data stored in multiple, geographically separate “directories”: in effect, to carry out scalable multi-copy updates on data. Banyan had a pretty good solution for this, but Microsoft created an even better one in Windows 2000, well before Novell’s solution; and after that, as the world shifted its attention to the Internet, Novell was not even near anyone’s short list for distributed computing.

Over the last decade, Novell has not lacked good solutions; its own directory product, administrative and security software, virtualization software, and most recently what I view as a very nice approach to porting Windows apps to Linux and mainframes. Still, a succession of CEOs failed to turn around the company, and, in the ultimate irony, Attachmate, with strengths and a long history itself in remote PC software, has decided to take on Novell’s assets.

I think that the best summing up of Novell’s ultimate strategic mistake was the remark of one of its CEOs shortly after assuming command: “Everyone thinks about Microsoft as the biggest fish in the ocean. It is the ocean.” In other words, Novell would have done better by treating Microsoft as the vendor of the environment that Novell had to support, and aiming to service that market, rather than trying to out-feature Microsoft. But everyone else made that mistake; why should Novell have been any different?

We are left not only with Novell’s past contributions to computing, but also with the contributions of its alumni. Some fostered the SMB market with products like the Pervasive database; some were drivers of the UNIX standards push and later the TP monitors that led to today’s app servers. One created the Burton Group, a more technically-oriented analyst firm that permanently improved the quality of the analyst industry.

And we are also left with an enthusiasm that could not be contained by traditional firms, and that moved on to UNIX, to the Web, to open source. The one time, in the late 1980s, I went to Novell’s user group meeting, it was clearly a bit different. After one of the presentations, a LAN servicer rose to ask a question. “So-and-so, LANs Are My Life”, he identified himself. That was the name of his firm: LANs Are My Life, Inc. It’s not a bad epitaph for a computer company: we made a product so good that for some people – not just Novell employees – it was our life. Rest in peace, Novell.

Sunday, October 10, 2010

BI For The Masses: 3 Solutions That Will Never Happen

I was reading a Business Intelligence (BI) white paper feed – I can’t remember which – when I happened across one whose title was, more or less, “Data Visualization: Is This the Way To Attract the Common End User?” And I thought, boy, here we go again.

You see, the idea that just a little better user interface will finally get Joe and Jane (no, not you, Mr. Clabby) to use databases dates back at least 26 years. I know, because I had an argument with my boss at CCA, Dan Ries iirc (a very smart fellow), about it. He was sure that with a fill-out-the-form approach, any line employee could do his or her own ad-hoc queries and reporting. Based on my own experiences as a naïve end user, I felt we were very far from being able to give the average end user an interface that he or she would be able or motivated to use. Here we are, 26-plus years later, and all through those years, someone would pipe up and say, in the immortal words of Bullwinkle, “This time for sure!” And every time, it hasn’t happened.

I divide the blame for this equally between vendor marketing and IT buying. Database and BI vendors, first and foremost, look to extend the ability of specific targets within the business to gain insights. That requires ever more sophisticated statistical and relationship-identifying tools. The vendor looking to design a “common-person” user interface retrofits the interface to these tools. In other words, the vendor acts like it is selling to a business-expert, not a consumer, market.

Meanwhile, IT buyers looking to justify the expense of BI try to extend its use to upper-level executives and business processes, not demand that it extend the interface approach of popular consumer apps to using data, or that it give the line supervisor who uses it at home a leg up at work. And yet, that is precisely how Word, Excel, maybe PowerPoint, and Google search wound up being far more frequently used than SQL or OLAP.

I have been saying things like this for the last 26 years, and somehow, the problem never gets solved. At this point, I am convinced that no one is really listening. So, for my own amusement, I give you three ideas – ideas proven in the real world, but never implemented in a vendor product – that if I were a user I would really like, and that I think would come as close as anything can to achieving “BI for the masses.”

Idea Number 1: Google Exploratory Data Analysis

I’m reading through someone’s blog when they mention “graphical analysis.” What the hey? There’s a pointer to another blog, where they make a lot of unproven assertions about graphical analysis. Time for Google: a search on graphical analysis results in a lot of extraneous stuff, some of it very interesting, plus Wikipedia and a vendor who is selling this stuff. Wikipedia is off-topic, but carefully reading the article shows that there are a couple of articles that might be on point. One of them gives me some of the social-networking theory behind graphical analysis, but not the products or the market. Back to Google, forward to a couple of analyst market figures. They sound iffy, so I go to a vendor site and get their financials to cross-check. Not much in there, but enough that I can guesstimate. Back to Google, change the search to “graphical BI.” Bingo, another vendor with much more market information and ways to cross-check the first vendor’s claims. Which products have been left out? An analyst report lists the two vendors, but in a different market, and also lists their competitors. Let’s take a sample competitor: what’s their response to “graphical analysis” or graphical BI? Nothing, but they seem to feel that statistical analysis is their best competitive weapon. Does statistical analysis cover graphical analysis? The names SAS and SPSS keep coming up in my Google searches. It doesn’t seem as if their user manuals even mention the word “graph”. What are the potential use cases? Computation of shortest path. Well, only if you’re driving somewhere. Still, if it’s made easy for me … Is this really easier than Mapquest? Let’s try a multi-step trip. Oog. It definitely could be easier than Mapquest. Can I try out this product? All right, I’ve got the free trial version loaded, let’s try the multi-step trip. You know, this could do better for a sales trip than my company’s path optimization stuff, because I can tweak it for my personal needs. Combine with Google Maps, stir … wouldn’t it be nice if there was a Wikimaps, so that people could warn us about all these little construction obstructions and missing signs? Anyway, I’ve just given myself an extra half-hour on the trip to spend on one more call, without having to clear it.

Two points about this. First, Google is superb at free-association exploratory analysis of documents. You search for something, you alter the search because of facts you’ve found, you use the results to find other useful facts about it, you change the topic of the search to cross-check, you dig down into specific examples to verify, you even go completely off-topic and then come back. The result is far richer, far more useful to the “common end user” and his or her organization, and far more fun than just doing a query on graphical data in the company data warehouse.

Second, Google is lousy at exploratory data analysis, because it is “data dumb”: It can find metadata and individual pieces of data, but it can’t detect patterns in the data, so you have to do it yourself. If you are searching for “graphical analysis” across vendor web sites, Google can’t figure out that it would be nice to know that 9 of 10 vendors in the market don’t mention “graph” on their web sites, or that no vendors offer free trial downloads.

The answer to this seems straightforward enough: add “guess-type” data analysis capabilities to Google. And, by the way, if you’re at work, make the first port of call your company’s data-warehouse data store, full of data you can’t get anywhere else. You’re looking for the low-priced product for graphical analysis? Hmm, your company offers three types through a deal with the vendor, but none is the low-cost one. I wonder what effect that has had on sales? Your company did a recent price cut; sure enough, it hasn’t had a big effect. Except in China: does that have to do with the recent exchange rate manipulations, and the fact that you sell via a Chinese firm instead of on your own? It might indeed, since Google tells you the manipulations started 3 weeks ago, just when the price cut happened.

You get the idea? Note that the search/analysis engine guessed that you wanted your company’s data called out, and that you wanted sales broken down by geography and in a monthly time series. Moreover, this is exploratory data analysis, which means that you get to see both the summary report/statistics and individual pieces of raw data – to see if your theories about what’s going on make sense.

In Google exploratory data analysis, the search engine and your exploration drive the data analysis; the tools available don’t. It’s a fundamental mind shift, and one that explains why Excel became popular and in-house on-demand reporting schemes didn’t, or why Google search was accepted and SQL wasn’t. One’s about the features; the other’s about the consumer’s needs.

Oh, by the way, once this takes off, you can start using information about user searches to drive adding really useful data to the data warehouse.

Idea Number 2: The Do The Right Thing Key

Back in 1986, I loved the idea behind the Spike Lee movie title so much that I designed an email system around it. Here’s how it works:

You know how when you are doing a “replace all” in Word, you have to specify an exact character string, and then Word mindlessly replaces all occurrences, even if some should be capitalized and some not, or even if you just want whole words to be changed and not character strings within words? Well, think about it. If you type a whole word, 90% of the time you want only words to be replaced, and capitals to be added at the start of sentences. If you type a string that is only part of a word, 90% of the time you want all occurrences of that string replaced, and capitals when and only when that string occurs at the start of a sentence. So take that Word “replace” window, and add a Do the Right Thing key (really, a point and click option) at the end. If it’s not right, the user can just Undo and take the long route.

The Do The Right Thing key is a macro; but it’s a smart macro. You don’t need to create it, and it makes some reasonable guesses about what you want to do, rather than you having to specify what it should do exactly. I found when I designed my email system that every menu, and every submenu or screen, would benefit from having a Do The Right Thing key. It’s that powerful an idea.

How does that apply to BI? Suppose you are trying to track down a sudden drop in sales one week in North America. You could dive down, layer by layer, until you found that stores in Manitoba all saw a big drop that week. Or, you could press the Break in the Pattern key, which would round up all breaks in patterns of sales, and dig down not only to Manitoba but also to big offsetting changes in sales in Vancouver and Toronto, with appropriate highlighting. 9 times out of ten, that will be the right information, and the other time, you’ll find out some other information that may prove to be just as valuable. Now do the same type of thing for every querying or reporting screen …

The idea behind the Do The Right Thing key is actually very similar to that behind Google Exploratory Data Analysis. In both cases, you are really considering what the end user would probably want to do first, and only then finding a BI tool that will do that. The Do The Right Thing key is a bit more buttoned-up: you’re probably carrying out a task that the business wants you to do. Still, it’s way better than “do it this way or else.”

Idea Number 3: Build Your Own Data Store

Back in the days before Microsoft Access, there was a funny little database company called FileMaker. It had the odd idea that people who wanted to create their own contact lists, their own lists of the stocks they owned and their values, their own grades or assets and expenses, should be able to do so, in just the format they wanted. As Oracle steadily cut away at other competitors in the top end of the database market, FileMaker kept gaining individual customers who would bring FileMaker into their local offices and use it for little projects. To this day, it is still pretty much unique in its ability to let users quickly whip up small-sized, custom data stores to drive, say, class registrations at a college.

To my mind, FileMaker never quite took the idea far enough. You see, FileMaker was competing against folks like Borland in the days when the cutting edge was allowing two-way links between, let’s say, students and teachers (a student has multiple teachers, and teachers have multiple students). But what people really want, often, is “serial hierarchy”. You start out with a list of all your teachers; the student is the top level, the teachers and class location/time/topic the next level. But you next want to see if there’s an alternate class; now the topic is the top level, the time at the next level, the students (you, and if the class is full) at a third level. If the number of data items is too small to require aggregation, statistics, etc.; you can eyeball the raw data to get your answers. And you don’t need to learn a new application (Outlook, Microsoft Money, Excel) for each new personal database need.

The reason this fits BI is that, often, the next step after getting your personal answers is to merge them with company data. You’ve figured out your budget, now do “what if”: does this fit with the company budget? You’ve identified your own sales targets, so how do these match up against those supplied by the company? You download company data into your own personal workspace, and use your own simple analysis tools to see how your plans mesh with the company’s. You only get as complex a user interface as you need.

Conclusions

I hope you enjoyed these ideas, because, dollars to doughnuts, they’ll never happen. It’s been 25 years, and the crippled desktop/folder metaphor and its slightly less crippled cousin, the document/link browser metaphor, still dominate user interfaces. It’s been fifteen years, and only now is Composite Software’s Robert Eve getting marketing traction by pointing out that trying to put all the company’s data in a data warehouse is a fool’s errand. It’s been almost 35 years, and still no one seems to have noticed that seeing a full page of a document you are composing on a screen makes your writing better. At least, after 20 years, Google Gmail finally showed that it was a good idea to group a message and its replies. What a revelation!

No, what users should really be wary of is vendors who claim they do indeed do any of the ideas listed above. This is a bit like vendors claiming that requirements management software is an agile development tool. No; it’s a retrofitted, slightly less sclerotic tool instead of something designed from the ground up to serve the developer, not the process.

But if you dig down, and the vendor really does walk the walk, grab the BI tool. And then let me know the millennium has finally arrived. Preferably not after another 26 years.

Thursday, October 7, 2010

IBM's Sustainability Initiative: Outstanding, and Out of Date

IBM’s launch of its new sustainability initiative on October 1 prompted the following thoughts: This is among the best-targeted, best-thought-out initiatives I have ever seen from IBM. It surprises me by dealing with all the recent reservations I have had about IBM’s green IT strategy. It’s all that I could have reasonably asked IBM to do. And it’s not enough.

Key Details of the Initiative
We can skip IBM’s assertion that the world is more instrumented and interconnected, and systems are more intelligent, so that we can make smarter decisions; it’s the effect of IBM’s specific solutions on carbon emissions that really matters. What is new – at least compared to a couple of years ago – is a focus on end-to-end solutions, and on solutions that are driven by extensive measurement. Also new is a particular focus on building efficiency, although IBM’s applications of sustainability technology extend far beyond that.

The details make it clear that IBM has carefully thought through what it means to instrument an organization and use that information to drive reductions in energy – which is the major initial thrust of any emission-reduction strategy. Without going too much into particular elements of the initiative, we can note that IBM considers the role of asset management, ensures visibility of energy management at the local/department level, includes trend analysis, aims to improve space utilization, seeks to switch to renewable energy where available, and optimizes HVAC for current weather predictions. Moreover, it partners with others in a Green Sigma coalition that delivers building, smart grid, and monitoring solutions across a wide range of industries, as well as in the government sector. And it does consider the political aspects of the effort. As I said, it’s very well targeted and very well thought out.
Finally, we may note that IBM has “walked the walk”, or “eaten its own dog food”, if you prefer, in sustainability. Its citation of “having avoided carbon emissions by an amount equal to 50% of our 1990 emissions” is particularly impressive.

The Effects
Fairly or unfairly, carbon emission reductions focus on reducing carbon emissions within enterprises, and emissions from the products that companies create. Just about everything controllable that generates emissions is typically used, administered, or produced by a company – buildings, factories, offices, energy, heating and cooling, transportation (cars), entertainment, and, of course, computing. Buildings, as IBM notes, are a large part of that emissions generation, and, unlike cars and airplanes, can relatively easily achieve much greater energy efficiency, with a much shorter payback period. That means that a full implementation of building energy improvement across the world would lead to at least a 10% decrease in the rate of human emissions (please note the italics; I will explain later). It’s hard to imagine an IBM strategy with much greater immediate impact.

The IBM emphasis on measurement is, in fact, likely to have far more impact in the long run. The fact is that we are not completely sure how to break down human-caused carbon emissions by business process or by use. Therefore, our attempts to reduce them are blunt instruments, often hitting unintended targets or squashing flies. Full company instrumentation, as well as full product instrumentation, would allow major improvements in carbon-emission-reduction efficiency and effectiveness, not just in buildings or data centers but across the board.

These IBM announcements paint a picture of major improvements in energy efficiency leading, very optimistically, to 30% improvements in energy efficiency and increases in renewable energy over the next 10 years – beyond the targets of most of today’s nations seeking to achieve a “moderate-cost” ultimate global warming of 2 degrees centigrade, in their best-case scenarios. In effect, initiatives like IBM’s plus global government efforts could reduce the rate of human emissions (I will explain the italics later) beyond existing targets. Meanwhile, Lester Brown has noted that from 2008 to 2009, measurable US human carbon emissions from fossil fuels went down 9 percent.

This should be good news. But I find that it isn’t. It’s just slightly less bad news.

Everybody Suffers
Everyone trying to do something about global warming has been operating under a set of conservative scientific projections that, for the most part, correspond to the state of the science in 2007. As far as I can tell, here’s what’s happened since, in a very brief form:

1. Sea rise projections have doubled, to 5 feet of rise in 80 years. In fact, more rapid than expected land ice loss means that 15 feet of rise may be more likely, with even more after that.

2. Scientists have determined that “feedback loops” such as loss of the ability of ice to reflect back light and therefore decrease ocean heat, which loss in turn increases global temperature, are in fact “augmenting feedbacks”, meaning that they will contribute to additional global warming even if we decrease emissions to near zero right now.

3. Carbon in the atmosphere is apparently headed still towards the “worst case” scenario of 1100 ppm. That, in turn, apparently means that the “moderate effect” scenario underlying all present global plans for mitigation of climate change with moderate cost (450 ppm) will in all likelihood not be achieved . Each doubling of ppm leads to 3.5 degrees centigrade or 6 degrees Fahrenheit average rise in temperature (in many cases, more like 10 degrees Fahrenheit in summer), and the start level was about 280 ppm, so we are talking 12 degrees Fahrenheit rise from reaching 1100 ppm , with follow-on effects and costs that are linear up to 700-800 ppm and difficult to calculate but almost certainly accelerating beyond that.

4. There is growing consensus that technologies to somehow sequester atmospheric carbon or carbon emissions in the ground, if feasible, will not be operative for 5-10 years, not at full effectiveness until 5-10 years after that, and not able to take us back to 450 ppm for many years after that – and not able to end the continuing effects of global warming for many years after that, if ever .
Oh, by the way, that 9 % reduction in emissions in the US? Three problems. First, that was under conditions in which GNP was mostly going down. As we reach conditions of moderate or fast growth, that reduction goes to zero. Second, aside from recession, most of the reductions achieved up to now come from low-cost-to-implement technologies. That means that achieving the next 9%, and the next 9% after that, becomes more costly and politically harder to implement. Third, at least some of the reductions come from outsourcing jobs and therefore plant and equipment to faster-growing economies with lower costs. Even where IBM is applying energy efficiencies to these sites, the follow-on jobs outside of IBM are typically less energy-efficient. The result is a decrease in the worldwide effect of US emission cuts. As noted above, the pace of worldwide atmospheric carbon dioxide rise continued unabated through 2008 and 2009. Reducing the rate of human emissions isn’t good enough; you have to reduce the absolute amount of human, human-caused (like reduced reflection of sunlight by ice) and follow-on (like melting permafrost, which in the Arctic holds massive amounts of carbon and methane) emissions.

That leaves adaptation to what some scientists call climate disruption. What does that mean?

Adaptation may mean adapting to a rise in sea level of 15 feet in the next 60 years and an even larger rise in the 60 years after that. Adaptation means adapting to disasters that are 3-8 times more damaging and costly than they are now, on average (a very rough calculation, based on the scientific estimate that a 3% C temperature rise doubles the frequency of category 4-5 hurricanes; the reason is that the atmosphere involved in disasters such as hurricanes and tornados can store and release more energy and water with a temperature rise). Adaptation means adjusting to the loss of food and water related to ecosystems that cannot move north or south, blocked by human paved cities and towns. Adaptation means moving to lower-cost areas or constantly revising heating and cooling systems in the same area, as the amount of cooling and heating needed in an area changes drastically. Adaptation means moving food sources from where they are in response to changing climates that make some areas better for growing food, others worse. Adaptation may mean moving 1/6 of the world’s population from one-third of the world’s cultivable land which will become desert . In other words, much of this adaptation will affect all of us, and the costs of carrying out this adaptation will fall to some extent on all of us, no matter how rich. And we’re talking the adaptation that, according to recent posts , appears to be already baked into the system. Moreover, if we continue to be ineffectual at reducing emissions, each decade will bring additional adaptation costs on top of what we are bound to pay already.

Adaptation will mean significant additional costs to everyone – because climate disruption brings costs to everyone in their personal lives. It is hard to find a place on the globe that will not be further affected by floods, hurricanes, sea-level rise, wildfires, desertification, heat that makes some places effectively unlivable, drought, permafrost collapse, or loss of food supplies. Spending to avoid those things for one’s own personal home will rise sharply – well beyond the costs of “mitigating” further climate disruption by low-cost or even expensive carbon-emission reductions.

What Does IBM Need To Do?
Obviously, IBM can’t do much about this by itself; but I would suggest two further steps.

First, it is time to make physical infrastructure agile. As the climate in each place continually changes, the feasible or optimum places for head offices, data centers, and residences endlessly change. It is time to design workplaces and homes that can be inexpensively transferred from physical location to physical location. Moving continually is not a pleasant existence to contemplate; but virtual infrastructure is probably the least-cost solution.

Second, it is time to accept limits. The effort to pretend that we do not need to accept the need to reduce emissions in absolute, overall terms, because technology, economics, or sheer willpower will save us, as we have practiced it since our first warning in the 1970s, is failing badly. Instead of talking in terms of improving energy efficiency, IBM needs to start talking in terms of absolute carbon emissions reduction every year, for itself, for its customers, and for use of its products, no matter what the business’ growth rate is.

One more minor point: because climate will be changing continually, adjusting HVAC for upcoming weather forecasts, which only go five days out, is not enough. When a place that has seen four days of 100 degree weather every summer suddenly sees almost 3 months of it, no short-term HVAC adjustment will handle continual brownouts adequately. IBM needs to add climate forecasts to the mix.

Politics, Alas
I mention this only reluctantly, and in the certain knowledge that for some, this will devalue everything I have said. But there is every indication, unfortunately, that without effective cooperation from governments, the sustainability goal that IBM seeks, and avoidance of harms beyond what I have described here, are not achievable.

Therefore, IBM membership in an organization (the US Chamber of Commerce) that actively and preferentially funnels money to candidates and legislators that deny there is a scientific consensus about global warming and its serious effects undercuts IBM’s credibility in its sustainability initiative and causes serious damage to IBM’s brand. Sam Palmisano as Chairman of the Board of a company (Exxon Mobil) that continues to fund some “climate skeptic” financial supporters (the Heritage Foundation, at the least) and preferentially funnels money to candidates and legislators that deny the scientific consensus does likewise.

Summary
IBM deserves enormous credit for creating today comprehensive and effective efforts to tackle the climate disruption crisis as it was understood 3 years ago. But they are three years out of date. They need to use their previous efforts as the starting point for creating new solutions within the next year, solutions aimed at a far bigger task: tackling the climate disruption crisis as it is now.

Monday, September 13, 2010

An Agile Development Rant

OK, I just accessed a recent Forrester report on agile development tools, courtesy of IBM, and boy, am I steamed. What were they thinking?
Let me list the ways in which I appear to be in strong disagreement with their methods and conclusions:
1. Let’s look at how Forrester decided organizations were doing agile development. Oh, “lean” development is agile, is it? No consideration of the arguments last year that lean NPD can have a counter-effect to agile development? … Interesting, “test-driven development” without anything else is agile. Hmm, no justification … Wow, Six Sigma is agile. Just by concentrating on quality and insisting that bugs get fixed at earlier stages, magically you are agile … Oh, here’s a good one. If your methodology isn’t formal, you aren’t agile. I bet all those folks who have been inventing agile processes over the last few years would agree with that one – not. … Oh, wow again, iterative is not the slightest bit agile, nor is spiral. But Six Sigma is. If I were a spiral methodology user, I would love to be in the same category as waterfall. The writers of the Agile Manifesto would be glad to hear about that one …
2. Next weirdness: “development and delivery leaders must implement a “just-in-time” planning loop that connects business sponsors to project teams at frequent intervals to pull demand and, if necessary, reprioritize existing project tasks based on the latest information available.” Gee, it seems to me this stands agile on its head. Agile was supposed to use regular input from business users to drive changing the product in the middle of the process. Now we’re in the land of “just in time”, “sponsors”, and “if necessary [!-ed.], reprioritizing existing tasks”. Note how subtly we have shifted from “developer and user collaborate to evolve a product that better meets the user’s needs” to “if you want this produced fast, sign off on everything, because otherwise we’ll have to rejigger our schedule and things may get delayed -- and forget about changing the features.”
3. Oh, here’s a great quote: “Measurement and software development have historically been poor bedfellows; heated debates abound about the value of measuring x or y on development projects. Agile changes this with a clear focus on progress, quality, and status metrics.” On what planet are these people living? I have heard nothing else for the past two years but counter-complaints between IT managers who complain that developers ignore their traditional cost/time-based metrics and agile proponents who point out correctly that “time to value” is a better gauge of project success, and that attempting to fit agile into the straitjacket of rigid “finish the spec” measurements prevents agile development from varying the product to succeed better. Let’s be clear: all the evidence I have seen, including evidence from a survey I did at the beginning of 2009 while at Aberdeen Group, indicates that those organizations that focused on progress, status, or quality metrics (typically, those that did not do agile development) did far worse on average in development speed, development cost, and software quality than those who did not.
4. Here’s an idea that shows a lack of knowledge of history: “historic ALM (application lifecycle management) vendors.” ALM means that you extend application development into the phase after online rollout, delivering bug fixes and upgrades via the same process. By that criterion, there are no historic ALM vendors, because there was no significant market for linking development and application management until recently. Believe me, I spent the late 90s and early 00s trying to suggest to vendors that they link development and application monitoring tools, and getting nowhere.
5. Forrester is for vendors with a strong focus on “agile/lean development.” Here we are again with the “lean.” Tell me, just where in the Agile Manifesto is there any positive mention of “product backlog”?
6. I can’t see any sign anywhere that Forrester, in their vendor assessments, has asked the simple question: is a “practitioner tool” that can be or is integrated with the vendor’s toolset likely to increase or decrease agility? For example, many formal/engineering tools are known to constrain the ability to change the product in the middle of the process. The old Rational design toolsets were famous for it – the approach was reasonable for huge aerospace projects. Their descendants live on in the RUP, and one of the big virtues of Team Concert was that it initially did not attempt to impose those tools on the agile developer. So, are these new “task management” tools that IBM is adding built from the ground up for agile developers, or are they old Rational tools bolted on? Or did Forrester just assume that all “task management” tools are pretty much equally agile? If they don’t even acknowledge the question, it sure looks that way.
7. Yup, here’s their set of criteria. The main ones are management, running a project, analytics, and life-cycle support. Any assessment of the agility of the tool in there? How about collaboration support, ability to change schedules in midstream, connection of analytics to market data to drive product changes, and a process for feedback of production-software problems and user needs to the developer to allow incremental adaptation? How about even a mention that any of these are folded into the criteria?
8. There seems to be confusion about the differences between collaborative, open-source, and agile development creeping in here. Collaborative and open-source development can be used in concert with agile development; but both can be used with development processes that are the opposite of agile – or what do you think the original offshoring movement that required rigid, clear spec definition as well as frequent global communication was about? I like CollabNet a lot; but, having followed them since the beginning of the century, I can attest that in the early years of the decade, there was no mention of agile development in their offerings or strategies. Ditto IBM; Eclipse is open-source, and in its early years there was no mention of agile development.
You know, I was aware that Forrester was losing historical perspective when it let stellar analysts like Merv Adrian go. But I never imagined this. I really think that IBM, who is now pointing to the Forrester report, should think again about citing this to show how good Team Concert is. To see why I’m still steamed, let me close with one quote from the Agile Manifesto: “We … value: individuals and interactions over processes and tools; working software over comprehensive documentation; customer collaboration over contract negotiation; responding to change over following a plan.” I could be wrong, but it seems to me that the whole tone of the Forrester Report is exactly the opposite. Heck of a job, Forrie.

Tuesday, August 31, 2010

1010data: Operating Well in a Parallel Universe?

Recently, I received a blurb from a company named 1010data, claiming that its personnel had been doing columnar databases for more than 30 years. As someone who was integrally involved at a technical level in the big era of database theory development (1975-1988), when everything from relational to versioning to distributed to inverted-list technology (the precursor to much of today’s columnar technology) first saw the light, I was initially somewhat skeptical. This wariness was heightened by receiving marketing claims that performance in data warehousing was better than not only relational databases but also than competitors’ columnar databases, even though 1010data does little in the way of indexing; and this performance improvement applied not only to ad-hoc queries with little discernable pattern, but also to many repetitive queries for which index-style optimization was apparently the logical thing to do.

1010data’s marketing is not clear as to why this should be so; but after talking to them, and reading their technical white paper, I have come up with a theory as to why it might be so. The theory goes like this: 1010data is not living in the same universe.

That sounds drastic. What I mean by this is, while the great mass of database theory and practice went one way, 1010data went another, back in the 1980s, and by now, in many cases, they really aren’t talking the same language. So what follows is an attempt to recast 1010data’s technology in terms familiar to me. Here’s the way I see it:

Since the 1980s, people have been wrestling with the problem of read and write locks on data. The idea is that if you decide to update a datum while another person is attempting to read it, each of you will see a different value, or the other person can’t predict which value he/she will see. To avoid this, the updater can block all other access via a write lock – which in turn slows down the other person drastically; or the “query from hell” can block updaters via a read lock on all data. In a data warehouse, updates are held and then rushed through at certain times (end of day/week) in order to avoid locking problems. Columnar databases also sometimes provide what is called “versioning”, in which previous values of a datum are kept around, so that the updater can operate on one value while the reader can operate on another.

1010data provides a data warehouse/business intelligence solution as a remote service – the “database as a service” variant of SaaS/public cloud. However, 1010data’s solution does not start by worrying about locking. Instead, it worries about how to provide each end user with a consistent “slice of time” database of his/her own. It appears to do this as follows: all data is divided up into what they call “master” tables (as in “master data management” of customer and supplier records), which are smaller, and time-associated/time-series “transactional” tables, which are the really large tables.

Master tables are more rarely changed, and therefore a full copy of the table after each update (really, a “burst” of updates) can be stored on disk, and loaded into main memory if needed by an end user, with little storage and processing overhead. This isn’t feasible for the transactional tables; but 1010data sees old versions of these as integral parts of the time series, not as superseded data; so the actual amount of “excess” data “appended” to a table, if maximum session length for an end user is a day, is actually small in all realistic circumstances. As a result, two versions of a transactional table include a pointer to a common ancestor plus a small “append”. That is, the storage overhead of additional versioning data is actually small compared to some other columnar technologies, and not that much more than row-oriented relational databases.

Now the other shoe drops, because, in my very rough approximation, versioning entire tables instead of particular bits of data allows you to keep those bits of data pretty much sequential on disk – hence the lack of need for indexing. It is as if each burst of updates comes with an online reorganization that restores the sequentiality of the resulting table version, so that reads during queries are potentially almost eliminating seek time. The storage overhead means that more data must be loaded from disk; but that’s more than compensated for by eliminating the need to jerk from one end of the disk to the other in order to inhale all needed data.

So here’s my take: 1010data’s claim to better performance, as well as to competitive scalability, is credible. Since we live in a universe in which indexing to minimize disk seek time plus minimizing added storage to minimize disk accesses in the first place allows us to push against the limits of locking constraints, we are properly appreciative of the ability of columnar technology to provide additional storage savings and bit-mapped indexing to store more data in memory. Since 1010data lives in a universe in which locking never happens and data is stored pretty sequentially, it can happily forget indexes and squander a little disk storage and still perform better.

1010data Loves Sushi
At this point, I could say that I have summarized 1010data’s technical value-add, and move on to considering best uses. However, to do that would be to ignore another way that 1010data does not operate in the same universe: it loves raw data. It would prefer to operate on data before any detection of errors and inconsistencies, as it views these problems as important data in their own right.

As a strong proponent of improving the quality of data provided to the end user, I might be expected to disagree strongly. However, as a proponent of “data usefulness”, I feel that the potential drawbacks of 1010data’s approach are counterbalanced by some significant advantages in the real world.

In the first place, 1010data is not doctrinaire about ETL (Extract, Transform, Load) technology. Rather, 1010data allows you to apply ETL at system implementation time or simply start with an existing “sanitized” data warehouse (although it is philosophically opposed to these approaches), or apply transforms online, at the time of a query. It’s nice that skipping the transform step when you start up the data warehouse will speed implementation. It’s also nice that you can have the choice of going raw or staying baked.

In the second place, data quality is not the only place where the usefulness of data can be decreased. Another key consideration is the ability of a wide array of end users to employ the warehoused data to perform more in-depth analysis. 1010data offers a user interface using the Excel spreadsheet metaphor and supporting column/time-oriented analysis (as well as an Excel add-in), thus providing better rolling/ad-hoc time-series analysis to a wider class of business users familiar with Excel. Of course, someone else may come along and develop such a flexible interface, although 1010data would seem to have a lead as of now; but in the meanwhile, the wider scope and additional analytic capabilities of 1010data appear to compensate for any problems with operating on incorrect data – and especially when there are 1010data features to ensure that analyses take into account possible incorrectness.

Caveat
To me, some of continuing advantages of 1010data’s approach depend fundamentally on the idea that users of large transactional tables require ad-hoc historical analysis. To put it another way, if users really don’t need to keep historical data around for more than an hour in their databases, and require frequent updates/additions for “real-time analysis” (or online transaction processing), then tables will require frequent reorganizing and will include a lot of storage-wasting historical data, so that 1010data’s performance advantages will decrease or vanish.

However, there will always be ad-hoc, in-depth queryers, and these are pretty likely to be interested in historical analysis. So while 1010data may or may not be the be-all, end-all data-warehousing database for all verticals forever, it is very likely to offer distinct advantages for particular end users, and therefore should always be a valuable complement to a data warehouse that handles vanilla querying on a “no such thing as yesterday” basis.

Conclusion
Not being in the mainstream of database technology does not mean irrelevance; not being in the same database universe can mean that you solve the same problems better. It appears that taking the road less travelled has allowed 1010data to come up with a new and very possibly improved solution to data warehousing, just as inverted list resurfaced in the last few years to provide new and better technology in columnar databases. And it is not improbable that 1010data can continue to maintain any performance and ad-hoc analysis advantages in the next few years.

Of course, proof of these assertions in the real world is an ongoing process. I would recommend that BI/data warehousing users in large enterprises in all verticals kick the tires of 1010data – as noted, testbed implementation is pretty swift – and then performance test it and take a crack at the really tough analyst wish lists. To misquote Santayana, those who do not analyze history are condemned to repeat it – and that’s not good for the bottom line.

Monday, August 23, 2010

IBM Acquired SPSS, Intel Acquires McAfee: More Problems Than Meet the Eye?

As Han Solo noted repeatedly in Star Wars – often mistakenly – I’ve got a bad feeling about this.

Last year, IBM acquired SPSS. Since then, IBM has touted the excellence of SPSS’ statistical capabilities, and its fit with the Cognos BI software. Intel has just announced that it will acquire McAfee. Intel touts the strength of McAfee’s security offerings, and the fit with Intel’s software strategy. I don’t quarrel with the fit, nor with the strengths that are cited. But it seems to me that both IBM and Intel may – repeat, may – be overlooking problems with their acquisitions that will limit the value-add to the customer of the acquisition.

Let’s start with IBM and SPSS. Back in the 1970s, when I was a graduate student, SPSS was the answer for university-based statistical research. Never mind the punched cards; SPSS provided the de-facto standard software for the statistical analysis typical in those days, such as regression and t tests. Since then, it has beefed up its “what if” predictive analysis, among other things, and provided extensive support for business-type statistical research, such as surveys. So what’s not to like?

Well, I was surprised to learn, by hearsay, that among psychology grad students, SPSS was viewed as not supporting (or not providing an easy way to do) some of the advanced statistical functions that researchers wanted to do, such as scatter plots, compared to SAS or Stata. This piqued my curiosity; so I tried to get onto SPSS’ web site (www.spss.com) on a Sunday afternoon to do some research in the matter. After several waits for a web page to display of 5 minutes or so, I gave up.

Now, this may not seem like a big deal. However, selling is about consumer psychology, and so good psychology research tools really do matter to a savvy large-scale enterprise. If SPSS really does have some deficits in advanced psychology statistical tools, then it ought to at least support the consumer by providing rapid web-site access, and it or IBM ought to at least show some signs of upgrading the in-depth psychology research capabilities that were, at least for a long time, SPSS’ “brand.” But if there were any signs of “new statistical capabilities brought to SPSS by IBM” or “upgrades to SPSS’ non-parametric statistics in version 19”, they were not obvious to me from IBM’s web site.

And, following that line of conjecture, I would be quite unconcerned, if I were SAS or Stata, that IBM had chosen to acquire SPSS. On the contrary, I might be pleased that IBM had given them lead time to strengthen and update their own statistical capabilities, so that whatever happened to SPSS sales, researchers would continue to require SAS as well as SPSS. It is even not out of the bounds of possibility to conjecture that SPSS will make IBM less of a one-stop BI shop than before, because it may open the door to further non-SPSS sales, if SPSS falls further behind in advanced psych-stat tools – or continues to annoy the inquisitive customer with 5-minute web-site wait times.

Interestingly, my concern about McAfee also falls under the heading of “annoying the customer.” Most of those who use PCs are familiar with the rivalry between Symantec’s Norton and McAfee in combating PC viruses and the like. For my part, my experience (and that of many of the tests by PC World) was that, despite significant differences, both did their job relatively well, and that one could not lose by staying with either, or by switching from the one to the other.

That changed, about 2-3 years ago. Like many others, I chose not to move to Vista, but stayed with XP. At about this time, I began to take a major hit in performance and startup time. Even after I ruthlessly eliminated all startup entries except McAfee (which refused to stay eliminated), startup took in the 3-5 minute range, performance in the first few minutes after the desktop displayed was practically nil, and performance after that (especially over the Web) was about half what it should have been. Meanwhile, when I switched to the free Comcast version of McAfee, stopping their automatic raiding of my credit card for annual renewals was like playing Whack-a-Mole, and newer versions increasingly interrupted processing at all times either to request confirmations of operations or to carry out unsolicited scans that slowed performance to a crawl in the middle of work hours.

Well, you say, business as usual. Except that Comcast switched to Norton last year, and, as I have I downloaded the new security software to each of five new/old XP/Win7 PCs/laptops, the difference has been dramatic in each case. No more prompts demanding response; no more major overhead from scans; startup clearly faster, and faster still once I removed stray startup entries via Norton; performance on the Web and off the Web close to performance without security software. And PC World assures me that there is still no major difference in security between Norton and McAfee.

Perhaps I am particularly unlucky. Perhaps Intel, as it attempts to incorporate McAfee’s security into firmware and hardware, will fix the performance problems and eliminate the constant nudge-nudge wink-wink of McAfee’s response-demanding reminders. It’s just that, as far as I can see from the press release, Intel does not even mention “We will use Intel technology to speed up your security software.” Is this a good sign? Not by me, it isn’t.

So I conjecture, again, that Intel’s acquisition of McAfee may be great news – for Symantec. What happens in consumer tends to bleed over into business, so problems with consumer performance may very well affect business users’ experience of their firewalls, as well; in which case, this would give Symantec the chance to make hay while Intel shines a light on McAfee’s performance, and to cement its market to such a point that Intel will find it difficult to push a one-stop security-plus-hardware shop on its customers.

Of course, none of my evidence is definitive, and many other things may affect the outcome. However, if I were a business customer, I would be quite concerned that, along with the value-add of the acquisitions of SPSS by IBM and McAfee by Intel, may come a significant value-subtract.

Monday, July 26, 2010

The Questionable State of Editing

Yes, I think I have earned the right to critique editing. I have been producing writing on the computer to be edited for at least 20 years; and I have been editing others’ work for more than 15 years. In my youth, I thought I might be a writer; so, in script, by typewriter, and within word processors (one of which I helped design) I have been editing myself and seeing others’ comments on my writing for more than 50 years.

However, I don’t agree with the usual criticisms or defenses of editors that I hear. I don’t fully agree with the dependence on Strunk & White; I do feel it has improved writing. I don’t believe that writers are typically superior to editors in their judgments; I don’t feel that editors automatically improve written material. I don’t feel that the greater ease of writing and the automation of spell-grammar checking afforded by word processing software has decreased quality or “blandified” the results; I do feel that the ability to produce more writing in the same amount of time has increased the perception by the reader that some of it is repetitive.
Let me summarize what I do believe:

1. Editors frequently fail to see that the editor’s job is both to make an argument or narrative clear and attractive to the reader and to preserve the distinctive voice of the writer.
2. Both writers and editors typically do not fully appreciate the importance of the general layout of the written material on a given page.
3. Editors tend to wrongly value completeness above conciseness, because they fail to consider the effect of cumulative particular changes on the whole piece.

The Editor’s Job
Here’s a story from my days at Aberdeen Group: I proposed to start a piece on IBM’s AS/400 product with the sentence: “Integration is In; and the IBM AS/400 has it.” IBM marketing people loved it; but the Aberdeen editor refused to allow it. He asserted that I would have a preposition with no object – clearly a violation of style guides – and that it was incorrect to say that the AS/400 ‘had integration’ – rather, it had features that enabled integration of customer applications – and that the semicolon should be replaced by a comma – clearly not by an em-dash! When informed that the newspapers in those days were full of lists of people and products that were In (meaning in fashion) or Out, he noted that European readers would not understand. In the end, as I remember, we ‘compromised’ with the sentence: “Integration is in fashion, and the IBM AS/400 has the features to enable effective use of integration by its customers.”

Now, the point of this story is not to pick on this particular editor, who was doing his best to ensure a common tone and higher minimum writing quality for Aberdeen publications. Moreover, I am not saying that the common sin of all editors is to squelch the individuality of the writer. On the contrary, I have seen many editors who, aside from enforcing the style guide, leave the use of passive voice or lengthy paragraphs pretty much intact.

My point here is that it is important that editors do both: make the piece communicate clearly to the reader and give the reader a sense that one distinctive person – and that person is the writer, not the editor – is talking to them. The editor’s criticisms were, by and large, correct – but I am pretty sure that the piece would have been better as is.

I think that what underlies this problem is a “consensus” view of readers – or, as Dale Carnegie used to say, the one negative thing we hear tends to outweigh all the positive things. It is true that some readers would have noted the lack of a preposition; but most would have enjoyed the In-joke. It is true that European readers would not have understood the allusion; but that would have made them more interested in finding out what the product was. It is true that the sentence was not clear as to how “integration” affected customers; but those that did not know already would find the sentence arresting enough to make them more curious about finding out – and the rest of the piece, not to mention the rest of the first paragraph, would have been ample explanation. Still, I think that the objections of the few outweighed in the editor’s mind the greater positive reaction of the many.
Now consider the same editing process applied to a whole piece. Most colloquialisms are replaced by words understandable by the densest reader; indications of speech patterns are replaced by rigid subject-verb-object constructions with few subordinate or follow-on clauses; each sentence is clear from immediate context. The result is a piece that is as much in the editor’s voice as in the writer’s and therefore is not as distinctive to the reader as either. In addition, the impact of the piece – its punch, if you will – is less; unnecessarily so.

By contrast, imagine the other editing style (“letting the thousand flowers bloom” by doing minimal damage to the writer’s voice) applied to a whole piece. Yes, portions of the piece are arresting; but the reader simply cannot follow what the writer is saying, often because the writer him/herself has not really thought out his/her argument. Letting passive voice go in all cases because it is “the usual practice in technical papers” or because it sounds more colloquial; treating ellipses as sacred when they are often slipshod memories of catch phrases; passing over long sections and long paragraphs completely – the result of these editorial tactics is often happiness on the part of the writer, but not on the part of the reader.

It may seem that I am asking the editor to walk a narrow path. No. I am asking editors to put themselves in the place of the writer; to ask, what is the writer attempting to accomplish here, and then to make the minimal changes to make that happen, while attempting to make those changes conform to the distinctive style of the writer, not to a one-size-fits-all template.

The Importance of the Visual
Once, when I was at Prime Computer, I was writing sections of a software design document that really needed architectural diagrams. In those days, the editing software available to employees did not have a PowerPoint or Visio to create such things, so the alternative was what’s called “line art.” You would use the characters on the keyboard to approximate the lines, circles, and arrows of a diagram; and because there was no kerning, the results were pretty viewable.
I got so good at line art that I started using it once per page of text. When I did, I discovered that “mini-graphics” improved all manner of writing. The visual was much more concrete to the reader; and because it was attached to almost a page’s worth of writing, the text explained the writing and vice versa – the two were in balance. Above all, visually, the graphics nicely broke up page after page of dense technical writing. Effectively, the document was more readable.

At an earlier time, I worked for a company creating the only word processing software I know of that typically displayed a page at a time. The result, for the writer, was to make him or her very conscious of the boredom the reader felt on seeing a page that was just one long paragraph. Readability improved, because the writer took care to insert bullets, numbered lists, and one-sentence paragraphs to “jazz up” the visual appearance of the text.

Since then, I have found relatively few editors that really appreciated the importance of the visual appearance of a page. Yes, most now require that a paragraph have no more than four sentences; but then they typically forbid one-sentence paragraphs. Many dislike bullets on sight; others demand that graphics be large, so that the text of a page is mostly crowded out. Tables are strictly rationed. The result, after editing, is page after page of 3-4-sentence, complex paragraphs, interspersed with a few pages of graphics that either are too vacuous and “marketing-lite,” or else visualize the data, not the key idea. The whole is then dressed up with “cut-outs”, to reinforce the idea that the text is not worth reading. And this is in writing that is intended to market a product or an idea; where that isn’t so, in histories or novels, the graphics are removed or isolated, bullets are verboten, tables are déclassé, and explanatory verbiage mushrooms to fill the gaps – making the problem even worse.

Now, granted, some of the problem is that today’s software just is not very good at supporting mini-graphic creation or odd page layouts – although it’s pretty good at bullets and tables. But I believe that the main problem is that writers know that editors will typically not accept their experiments; so they don’t bother.
To my mind, the solution is actually pretty simple: the editor should imagine a particular page, not with words, but with shadowed blocks where the words and graphics are. What looks more visually interesting, 3 large blocks of text on a page or 2 large blocks and 3 indented lines for a numbered list? What looks better, 2 large blocks of figures plus a thin strip of text at the bottom, or 3 larger blocks of text with a small block for a graphic interspersed? I know what I think; and the reaction I get from readers who are not conditioned to expect a certain layout is almost invariably positive: better visual layout, well balanced with the text, adds punch and clarity, no matter what the content.

The Forest and the Trees
When I was writing for Aberdeen Group, I had to write literally hundreds of white papers 8-12 pages in length that vendors would consider buying. As a result, literally hundreds of times, two or more reviewers on the vendor’s side would edit my drafts. And as a result, I learned to write short, choppy sentences everywhere in the first draft.

Why did I do that? Because the natural tendency of the editors was to perceive what was left out, and to request additions. And since the editor had strong control over whether the piece was published or bought, there was strong pressure on me, and on every writer, to accommodate additions, while still somehow preserving the overall flow of the argument and keeping the visual layout and organization simple. That could only be done by cramming more and more clauses within each sentence. By anticipating that editors were going to do this, I was able to avoid massive reorganizations of the white paper that would have increased the likelihood of white paper rejection. However, the results of several rounds of editing were not usually very pretty.

Here’s an example. The initial version of one paragraph about Oracle OLAP Option was:

Within the general class of analytic tools, OLAP differentiates itself by its ability to capture data in sophisticated “multidimensional” ways, as compared to relational data, and to perform “what-if” analyses on that data. Therefore, OLAP is typically used less for standard reporting and straightforward querying tasks, and more for sophisticated analysis.

After the thirtieth (!) editing pass, it became:

Businesses are increasingly finding that traditional query and reporting tools are not enough. Traditional query and reporting tools are sometimes not performant or scalable enough to support all of users’ complex ad-hoc queries, not efficient for analysis beyond two or three “dimensions”, and not functional enough for sophisticated trend analysis or forecasting. For example, a CEO of a Value-Added Reseller (VAR) might have observed a significant downturn in business sales during the most recent month. Using a query tool, the CEO would typically look at revenues for that month for all regions, possibly via a fairly complicated query. Such a query could take quite a while to return results from a very large data set. By contrast, a good OLAP solution enables rapid, in-depth analysis and forecasting involving difficult-to-anticipate ad-hoc queries for both immediate tactical and longer-term strategic decision-making. OLAP also differentiates itself by its ability to analyze data across more than two or three dimensions.

Actually, I’ve left out the fact that I had to subdivide the above into 8 sentences, 3 paragraphs, and 2 sections. And chop out later material to make room. And reorganize all the sections to accommodate the additional section. And that was only one paragraph …

To an editor, it looks like the piece is becoming better. There’s more information in there. There’s more reinforcement via examples. Because the argument is laid out more fully, it seems more credible. There’s just one problem: because of the added length, the most important point you’re making is less clear. By planting new trees, you have obscured the overall view of the forest.

This relates to my view of the effects of new writing technology. Compared to typewriters and writing by hand, I love word processing. No more desperate avoidance of typos; no more carriage returns; no more slow recording of thoughts; no more rigid organization determined by the first draft. And the fact that you can turn out acres more text is fine by me: I find that what I lose in careful scrutiny of each word I gain in the ability to instantly see and correct for gaps in the argument or the narrative. No, I don’t think that the average product of the word processor is worse.

However, the editing can make it worse, or reinforce its bad tendencies. Handwritten and even typewritten works from the old days are denser, more concise, because every word counts. The experimental, the elegant, stands out more clearly. By contrast, lazy word-processor writing can seem much more the same, with the same old stylistic quirks repeated ad nauseam. And the tendency that I have cited above, the tendency of editors to value “completeness” over “conciseness”, makes those features of the new word-processor-driven writing worse.

To me, the solution is for editors to recognize, deep in their bones, that there is a tradeoff here. The way to see if you have gone too far is to go to the very first paragraph of the piece, where “why you the reader should care” should be laid out very clearly. Then go to the offending section, and ask, is it clear how the edited version fits with the argument laid out in the first paragraph? Can I read the first sentences of each paragraph in this section and see an overall argument that makes one key point within that first paragraph? If not, is the added information really worth the cost in clarity?

Conclusion
In case you haven’t noticed, in this piece I have been practicing what I preach (as I feel appropriate). There’s colloquial speech in this post, plus paragraphs mostly limited to 2-4 sentences. There’s a bulleted list on the very first page; there’s variation, and experimentation; I have limited my “completeness” in the interests of clarity. It seems to work for me.

But if you’re not convinced, here’s another example. The first paragraph of a draft I recently submitted for editing runs as follows:

Roger Zelazny, in his novel “Lord of Light”, had one of his characters comment: “None sing hymns to air; but oh, to be without it!” Run-the-business applications like xxx are an organization’s air supply: without effective performance management, response-time delays result in angry customers and unproductive employees, cause outages from trial-and-error fixes, and signal problems that will lead to outages – making xxx effectively unavailable for long periods, choking off the sales and production cycle, and causing the organization to seize up as if it were gasping for breath.

Now here’s the same thing, after editing (that I had to clean up a bit):

The use of BI (business intelligence) software for speeding the organization’s financial reporting and planning/budgeting has an unusually direct impact on an enterprise’s bottom line. Closing the books rapidly means that the CEO can immediately react to changes in the business environment, and avoids money wasted on failing investments. Faster planning, and budgeting according to a “what-if” plan, allows deeper investigation of alternative strategies, ensuring that the final strategy will be more effective. Planning that can take into account monthly and quarterly results and implement a new spending plan within a week of a period’s end is the mark of an agile, cost-effective organization. However, complex organization charts for growing organizations doing deeper analysis mean that BI planning/budgeting/consolidation software must perform and scale better with each version.

Which makes the reader care more about reading the rest of the piece? Which makes the main point of the piece – you need good performance management software – clearer? Which grabs the eye even before you’ve really started reading?

You be the judge. 

Thursday, June 24, 2010

Groucho Marx Syndrome Lives

Over the years, I have discovered a psychological trait that I call Groucho Marx Syndrome. Here’s the way it goes:

In the movie Duck Soup, Prime Minister Groucho Marx has been insulted by the leader of a neighboring country, and has slapped him in return, putting both countries on the brink of war. He is told that the leader will come to apologize, and agrees to meet him. Then he thinks out loud (from my memory of the movie), “I will hold out my hand in friendship, and we’ll shake. But suppose he doesn’t? That would be a fine thing, wouldn’t it, me holding out my hand and his refusing to shake it? What kind of cad would do such a thing? Me, holding out my hand and him refusing to shake it? How dare he! How dare he refuse to shake the hand of friendship!” And just then the leader walks up and Groucho shouts, “So, you refuse the hand of friendship? You cad!” and slaps the leader again. War ensues.

The point of Groucho Marx Syndrome is that certain people are prone to convince themselves that other people not only will do something bad to them, but have already done so – even when those other people haven’t, and won’t, do any such thing. Over the years, I have seen many worriers lapse into instances of Groucho Marx Syndrome – say, when they anticipate being fired and are hostile to the boss who they believe has already arranged to fire them, thereby causing the firing. But there are also people who are in the habit of convincing themselves that, behind their backs, most other people are spending their time undercutting them -- even to the point of saying to others “I know you made sure my application was turned down” when the application has not yet actually been decided on. For those people, behavior like that of Groucho Marx is habitual, is a Syndrome.

Over the years, I haven’t seen as much evidence of Groucho Marx Syndrome in public figures. Today, however, I was delighted and distressed to come across what seems to me be an excellent example of this psychological problem. Here’s the relevant quote from Paul Krugman’s blog:

“… one of Germany’s Wise Guys Men has lashed out at me in Handelsblatt over my criticism of Axel Weber:

Wolfgang Franz, who heads the German government’s economic advisory panel known as the Wise Men, tore into Krugman — and the US — in an op-ed in the German business daily Wednesday, titled “How about some facts, Mr. Krugman?”

“Where did the financial crisis begin? Which central bank conducted monetary policy that was too loose? Which country went down the wrong path of social policy by encouraging low income households to take on mortgage loans that they can never pay back? Who in the year 2000 weakened regulations limiting investment bank leverage ratios, let Lehman Brothers collapse in 2008 and thereby tipped world financial markets into chaos?” he wrote.

Now, what may not be obvious from this quote is that Paul Krugman is not a representative of the United States; far from it. Moreover, he was in some respects a prescient critic of some of the actions that Mr. Franz cites (in the case of encouraging low income households to take on mortgage loans that they can never pay back, it appears that Mr. Franz has his facts wrong, as Dr. Krugman notes). By contrast, I can find no obvious record in Mr. Franz’s publications of such anticipatory criticism, and a CNBC interview on Aug. 23, 2008 finds Mr. Weber not only apparently oblivious to the problems cited by Mr. Franz but also completely wrong about the medium-term and long-term outlook for the European economy. But that isn’t the real point here.

What Mr. Franz appears to be saying is, “The United States government is about to come in and tell us how to be prudent economically. How dare they? Aren’t they the folks who caused this mess? Aren’t they the folks who preached at the whole world that they knew better than everyone else? And now they’re coming here and telling us, who were ever so conservative economically, that we are doing it all wrong. What gall! They show up and tell us we’re running things wrong! Well, we won’t stand for it! We’re going to tell them off, right now!”

The rest of Mr. Franz’s editorial, translated by Bing, as I read it, appears to show some of the same style of thinking. Continue debt-backed investment and delay spending cutbacks? But everyone knows the US is just going to delay cutting back until too late; so they’re already too late, so we should cut back now. What about the continuing effects of German cutbacks on European policy and thereby on the rest of the world? Clearly we’re talking about our effects on the US. Our trade with them is miniscule; so there won’t be any effects, so there haven’t been any effects, so we have refuted that argument.

Just to reiterate:

(1) Paul Krugman is not the US government.
(2) By and large, there are different people running the US government’s economic policies and/or with a different message than those who were running it in 1980-2008.
(3) Mr. Weber has responsibility for the European as well as the German economy, and will have more responsibility for Europe in the near future as head of the ECB, which is apparently what Dr. Krugman is addressing.
(4) Today’s US government has not, as far as I can tell, officially agreed with Paul Krugman’s criticisms or recommendations yet.

With all undue respect, Mr. Franz appears to show strong signs of Groucho Marx Syndrome. The consequences of this Syndrome, between two people in ordinary life, are unnecessary ruptures and fights that benefit no one, least of all the person with the Syndrome. But, as in Duck Soup, some of these people represent nations. Let us hope that Groucho Marx Syndrome is not widespread among the actors in the present economic crisis.

Or economic war could ensue.

Thursday, June 10, 2010

The Last 50 Years: Not What I Expected

For some reason, I was remembering today the predictions of the early 1990s that “all the great ideas have been found” already, and that little remained to be discovered. I asked myself, compared to 50 years ago when I first encountered various scientific and social-science disciplines, how far we had come since then? What had I expected, and what did we get? What was new and made me rethink things? What had made a big splash but was still not really revolutionary enough? Here’s my list.
Physics. I hoped for fusion as a new power source by now. I hoped for new containment of antimatter as a rocket fuel. I hoped for a full explanation of quantum phenomena. I can’t say any of these things has arrived. Compared to them, greater knowledge of subatomic particles and the beginnings of the Big Bang, better Grand Unified Theories, and the like seem like small change. Maybe in the next 50 years …
Environment. Of all the areas of scientific inquiry, this I believe has brought forth the most astonishing new ideas. Despite the threat of climate change, I am happy I have lived to learn the subtlety and variety of the chemical cycle between earth, sky, and life, now and in the past. In the 1960s, none of this was apparent; there was no concept of ecosystem, and the effects of climate on everything were barely beginning to be understood. I don’t care if we still can’t predict long-term weather well; what has been discovered is fascinating.
Biology. I give this mixed reviews. It seemed apparent to everyone in the 1960s that so much about the human body remained to be discovered. Now we have sequenced the genome and are beginning to identify artificial proteins, and yet the effects on disease treatments and our knowledge of the human brain are still slow to arrive. Granted, knowledge of our DNA ancestry is a great new window into prehistory and anthropology; nevertheless, this belongs more to applications than to fundamental new ideas.
Economics. Since my first economics class in college, I have been hoping for insights into how economies evolve with technological change. Yes, the theory of productivity has advanced; but not to the point where policy-makers can exert much control over it. The one area where I feel a surprising amount of good new ideas have come forth is international economics, where I believe there is a far greater understanding of factors such as exchange rates, an understanding that does not prevent horrible world-wide crashes but gives a smart policy maker the chance to avert them. We appear to know much more about gold as a standard, about fixed exchange rates, and about central-bank policy than we did back in the 1960s, or even the 1980s. Alas, the intervention of politics and the interests of the rich into economic research have prevented much more from being done.
Mathematics. I confess to being spoiled by the efforts of giants such as Godel before I was born in discovering the concept of unknowability. While chaos theory and networking theory have been fascinating, I think that the work of the 1970s in applied mathematics, in exploring undecidability and “effective” uncomputability (NP-complete algorithms), not to mention “problem solution lower bounds”, are surprisingly profound and should rank among the top new ideas in mathematics in the last 50 years – certainly not something I would have expected, since computers were barely visible to me in 1960. Of course, since I have spent my life since then in efforts related to computer science, I may be prejudiced; but I still believe that those who learn about unknowability grapple with a shift in perception almost as significant as that faced by those looking at Einstein’s theory of relativity in the early 1900s.
Computer Science. As Sesame Street would say, one of these things is not like the others … but I throw it in only to discard it. Yes, the advent of computers has changed our way of life profoundly over the last 50 years in ways that the sciences listed above cannot match; but really new ideas that profoundly alter the way one looks at the world? Unless you include the undecidability results of the 1970s – and those I prefer to classify under mathematics – I don’t think so. Nanotechnology? Application of lasers? Email? The Web? As technologies, their effective is pervasive; but I have never felt they offered a fundamental alternation in my view of the world. Maybe if artificial intelligence or linguistics had advanced further, those areas would provide new insights – certainly the idea that intuitive thought finds exceptions more powerful as learning aids than corroborations is interesting. But no; I don’t see any blockbusters here.
Looking over this list, I am struck by how wrong I was about the direction from which fundamental new scientific ideas would come over the last 50 years. It makes me very cautious about treating science as leading steadily into new technology, and technology as leading steadily into productivity and better living standards. I could be happily underestimating; I could be unhappily overestimating. One thing, though: the arrival of fundamental ideas has not flagged, and I can hope to be pleasantly surprised for the next 50 years – or whatever.

Tuesday, May 18, 2010

EMC World: The Dedupe Revolution

What, to my mind, was the biggest news out of EMC World? The much-touted Private Cloud? Don’t think so. The message that, as one presenter put it, “Tape Sucks”? Sorry. FAST Cache, VPLEX, performance boosts, cost cuts? Not this time. No, what really caught my attention was a throw-away slide showing that almost a majority of EMC customers have already adopted some form of deduplication technology, and that in the next couple of years, probably a majority of all business storage users will have done so.
Why do I think this is a big deal? Because technology related to deduplication holds the potential of delivering benefits greater than cloud; and user adoption of deduplication indicates that computer markets are ready to implement that technology. Let me explain.
First of all, let’s understand what “dedupe”, as EMC calls it, and a related technology, compression, mean to me. In its initial, technical sense, deduplication means removing duplicates in data. Technically, compression means removing "noise" -- in the information-theory sense of removing bits that aren’t necessary to convey the information. Thus, for example, removing all but one occurrence of the word “the” in storing a document would be deduplication; using the soundex algorithm to represent “the” in two bytes would be compression.
However, today popular “compression” products often use technical-deduplication as well; for example, columnar databases compress the data via such techniques as bit-mapped indexing, and also de-duplicate column values in a table. Likewise, data deduplication products may apply compression techniques to shrink the storage size of data blocks that have already been deduplicated. So when we refer to “dedupe”, it often includes compression, and when we refer to compressed data, it often has been deduplicated as well. To try to avoid confusion, I refer to “dedupe” and “compress” to mean the products, and deduplication and compression to mean the technical terms.
When I state that there is an upcoming “dedupe revolution”, I really mean that deduplication and compression combined can promise a new way to improve not only backup/restore speed, but also transaction processing performance. Because, up to now, “dedupe” tools have been applied across SANs (storage area networks), while “compress” tools are per-database, “dedupe” products simply offer a quicker path than “compress” tools to achieving these benefits globally, across an enterprise.
These upcoming “dedupe” products are perhaps best compared to a sponge compressor that squeezes all excess “water” out of all parts of a data set. That means not only removing duplicate files or data records, but also removing identical elements within the data, such as all frames from a loading-dock video camera that show nothing going on. Moreover, it means compressing the data that remains, such as documents and emails whose verbiage can be encoded in a more compact form. When you consider that semi-structured or unstructured data such as video, audio, graphics, and documents makes up 80-90% of corporate data, and that the most “soggy” data types such as video use up the most space, you can see why some organizations are reporting up to 97% storage-space savings (70-80%, more conservatively) where “dedupe” is applied. And that doesn’t include some of the advances in structured-data storage, such as the columnar databases referred to above that “dedupe” columns within tables.
So, what good is all this space saving? Note the fact that the storage capacity that users demand has been growing by 50-60 % a year, consistently, for at least the last decade. Today’s “dedupe” may not be appropriate for all storage; but where it is, it is equivalent to setting back the clock 4-6 years. Canceling four years of storage acquisition is certainly a cost-saver. Likewise, backing up and restoring “deduped” data involves a lot less to be sent over a network (and the acts of deduplicating and reduplicating during this process add back only a fraction of the time saved), so backup windows and overhead shrink, and recovery is faster. Still, those are not the real reasons that “dedupe” has major long-term potential.
No, the real long-run reason that storage space saving matters is that it speeds retrieval from disk/tape/memory, storage to disk/tape/memory, and even processing of a given piece of data. Here, the recent history of “compress” tools is instructive. Until a few years ago,the effort of compressing and uncompressing tended to mean that compressed data actually took longer to retrieve, process, and re-store; but, as relational and columnar database users have found out, certain types of “compress” tools allow you to improve performance – sometimes by an order of magnitude. For example, recently, vendors such as IBM are reporting that relational databases such as DB2 benefit performance-wise from using “compressed” data. Columnar databases are showing that it is possible to operate on data-warehouse data in “compressed” form, except when it actually must be shown to the user, and thereby get major performance improvements.
So what is my vision of the future of “dedupe”? What sort of architecture are we talking about, 3-5 years from now? One in which the storage tiers below fast disk (and, someday, all the tiers, all the way to main memory) have “dedupe”-type technology added to them. In this context, it was significant that EMC chose at EMC World to trumpet “dedupe” as a replacement for Virtual Tape Libraries (VTL). Remember, VTL typically allows read/query access to older, rarely accessed data within a minute; so, clearly, deduped data on disk can be reduped and accessed at least as fast. Moreover, as databases and applications begin to develop the ability to operate on “deduped” data without the need for “redupe”, the average performance of a “deduped” tier will inevitably catch up with and surpass that of one which has no deduplication or compression technology.
Let’s be clear about the source of this performance speedup. Let us say that all data is deduplicated and compressed, taking up 1/5 as much space, and all operations can be carried out on “deduped” data instead of its “raw” equivalents. Then retrieval from any tier will be 5 times as fast and 5 times as much data can be stored in the next higher tier for even more performance gains. Processing this smaller data will take ½ to 1/5 as much time. Adding all three together and ignoring the costs of “dedupe”/”redupe”, a 50% speedup of an update and an 80% performance speedup of a large query seems conservative. Because the system will only need “dedupe”/”redupe” rarely, “dedupe” when the data is first stored and “redupe” whenever the data is displayed to a user in a report or query response, and because the task could be offloaded to specialty “dedupe”/”redupe” processors, on average “dedupe”/redupe” should add only minimal performance overhead to the system, and should subtract less than 10% from the performance speedup cited above. So, conservatively, I estimate the performance speedup from this future “dedupe” at 40-70%.
What effect is this going to have on IT, assuming that “the dedupe revolution” begins to arrive 1-2 years from now? First, it will mean that, 3-5 years out, the majority of storage, rather than a minority replacing some legacy backup, archive, or active-passive disaster recovery storage, will benefit from turning the clock back 4-6 years via “dedupe.” Practically, performance will improve dramatically and storage space per data item will shrink drastically, even as the amount of information stored continues its rapid climb – and not just in data access, but also in networking. Data handled in deduped form everywhere within a system also has interesting effects on security: the compression within “dedupe” is a sort of quick-and-dirty encryption that can make data pilferage by those who are not expert in “redupe” pretty difficult. Storage costs per bit of information stored will take a sharp drop; storage upgrades can be deferred and server and network upgrades slowed. When you add up all of those benefits, from my point of view, “the dedupe revolution” in many cases does potentially more for IT than the incremental benefits often cited for cloud.
Moreover, implementing “dedupe” is simply a matter of a software upgrade to any tier: memory, SSD, disk, or tape. So, getting to “the dedupe revolution” potentially requires much less IT effort than getting to cloud.
One more startling effect of dedupe: you can throw many of your comparative TCO studies right out the window. If I can use “dedupe” to store the same amount of data on 20% as much disk as my competitor, with 2-3 times the performance, the TCO winner will not be the one with the best processing efficiency or the greatest administrative ease of use, but the one with the best-squeezing “dedupe” technology.
What has been holding us back in the last couple of years from starting on the path to dedupe Nirvana, I believe, is customers’ wariness of a new technology. The EMC World slide definitively establishes that this concern is going away, and that there’s a huge market out there. Now, the ball is in the vendors’ court. This means that all vendors, not just EMC, will be challenged to merge storage “dedupe” and database “compress” technology to improve the average data “dry/wet” ratio, and “dedupify” applications and/or I/O to ensure more processing of data in its “deduped” state. (Whether EMC’s Data Domain acquisition complements its Avamar technology completely or not, the acquisition adds the ability to apply storage-style “dedupe” to a wider range of use cases; so EMC is clearly in the hunt). Likewise, IT will be challenged to identify new tiers/functions for “dedupe,” and implement the new “dedupe” technology as it arrives, as quickly as possible. Gentlemen, start your engines, and may the driest data win!

Monday, April 5, 2010

Memories of The Other Type of Software Failure

A few weeks ago, I met an old friend – Narain Gehani, once at Cornell with me, then at Bell Labs, and now at Stevens Institute of Technology – and he presented me with a book of reminiscences that he had written: “Bell Labs: Life in the Crown Jewel”. I enjoyed reading about Narain’s life in the 35 years since we last met; Narain tells a great story, and is a keen observer. However, one thing struck me about his experiences that he probably didn’t anticipate – the frequency with which he participated in projects aimed at delivering software products or software-infused products, products that ultimately didn’t achieve merited success in one marketplace or other (Surf n Chat and Maps r Us were the ones that eventually surfaced on the Web).


This is one aspect of software development that I rarely see commented on. We have all talked about the frequency of software-development project failure due to poor processes. But how about projects that by any reasonable standard should be a success? They produce high-quality software. The resulting solutions add value for the customer. They are well received by end users (assuming end users see them at all). They are clearly different from everything else in the market. And yet, looking back on my own experience as a software developer, I realize that I had a series of projects that were strikingly similar to Narain’s – they never succeeded in the market as I believe they should have.


My first project was at Hendrix Corporation, and was a word processing machine that had the wonderful idea of presenting a full page of text, plus room on the side for margins (using the simple trick of rotating the screen by 90 degrees to achieve “portrait” mode). For those who have never tried this, it has a magical effect on writing: you are seeing the full page as the reader is seeing it, discouraging long paragraphs and encouraging visual excitement. The word processor was fully functional, but after I left Hendrix it was sold to AB Dick, which attempted to fold it into their own less-functional system, and then it essentially vanished.


Next was a short stint at TMI, developing bank-transfer systems for Bankwire, Fedwire, etc. The product was leading-edge at the time, but the company was set up to compensate by bonuses that shrank as company size increased. As employee dissatisfaction peaked, it sold itself to Logica, key employees left, and the product apparently ceased to evolve.

At Computer Corporation of America, I worked on a product called CSIN. It was an appliance that would do well-before-its-time cross-database querying about chemical substances “in the field”, for pollution cleanup efforts. It turned out that the main market for the product was librarians; and these did not have the money to buy a whole specialized machine. That product sold one copy in its lifetime, iirc.


Next up was COMET/204, one of the first email systems, and the only one of its time to be based on a database – in this case, CCA’s wonderful MODEL 204. One person, using MODEL 204 for productivity, could produce a new version of COMET/204 in half the time it took six programmers, testers, and project managers to create a competing product in C or assembler. COMET/204 had far more functionality than today’s email systems, and solved problems like “how do you discourage people from using Reply All too much?” (answer: you make Reply the default, forcing them to do extra effort to do Reply All spam). While I was working on it, COMET/204 sold about 10 copies. One problem was the price: CCA couldn’t figure out how to price COMET/204 competitively when it contained a $60,000 database. Another was that in those days, many IT departments charged overhead according to usage of system resources – in particular, I/Os (and, of course, using a database meant more I/Os, even if it delivered more scalable performance). One customer kept begging me to hurry up with a new version so he could afford to add 10 new end users to his department’s system.


After that, for a short stint, there was DEVELOPER/204. This had the novel idea that there were three kinds of programming: interface-driven, program-driven, and data-driven. The design phase would allow the programmer to generate the user interface, the program outline, or the data schema; the programmer could then generate an initial set of code (plus database and interface) from the design. In fact, you could generate a fully functional, working software solution from the design. And it was reversible: if you made a change in the code/interface/data schema, it was automatically reflected back into the design. The very straightforward menu system for DEVELOPER/204 simply allowed the developer to do these things. After I left CCA, the company bundled DEVELOPER/204 with MODEL/204, and I was later told that it was well received. However, that was after IBM’s DB2 had croaked MODEL 204’s market; so I don’t suppose that DEVELOPER/204 sold copies to many new customers.


My final stop was Prime, and here, with extraordinary credit to Jim Gish, we designed an email system for the ages. Space is too short to list all of the ideas that went into it; but here are a couple: it not only allowed storage of data statically in user-specified or default folders, but also dynamically, by applying “filters” to incoming messages, filters that could be saved and used as permanent categorizing methods – like Google’s search terms, but with results that could be recalled at will. It scrapped the PC’s crippled “desktop metaphor” and allowed you to store the same message in multiple folders – automatic cross-referencing. As a follow-on, I did an extension of this approach into a full-blown desktop operating system. It was too early for GUIs – but I did add one concept that I still love: the “Do the Right Thing” key, which figured out what sequence of steps you wanted to take 90% of the time, and would do it for you automatically. Because Prime was shocked by our estimate that the email system would take 6 programmers 9 months to build, they farmed it to a couple of programmers in England, where eventually a group of 6 programmers took 9 months to build the first version of the product – releasing the product just when Prime imploded.


Looking back on my programming experiences, I realized that the software I had worked on had sold a grand total (while I was there) of about 14 copies. Thinking today, I realize that my experiences were pretty similar to Narain’s: as he notes, he went through “several projects” that were successful development processes (time, budget, functionality) but for one reason another never took off as products. It leads me to suspect that this type of thing – project success, product lack of success happens just as much as software development failure, and very possibly more often.


So why does it happen? It’s too easy to blame the marketers, who often in small companies have the least ability to match a product to its (very real) market. Likewise, the idea that the product is somehow “before its time” and not ready to be accepted by the market may be true – but in both Narain’s and my experience, many of these ideas never got to that point, never got tested in any market at all.


If I had to pick two factors that most of these experiences had in common, they were, first, barriers within the company between the product and its ultimate customers, and second, markets that operate on the basis of “good enough”. The barriers in the company might be friction with another product (how do you low-ball a high-priced database in email system pricing?) or suspicion of the source (is Bell Labs competing with my development teams?). The market’s idea of “good enough” may be “hey, gmail lets me send and receive messages; that’s good enough, I don’t need to look further”.

But the key point of this type of software failure is that, ultimately, it may have more profound negative effects on top and bottom lines, worker productivity, and consumer benefits in today’s software-infused products than development failure. This is because, as I have seen, in many cases the ideas are never adopted. I have seen some of the ideas cited above adopted eventually, with clear benefits for all; but some, like cross-referencing, have never been fully implemented or widely used, 25 or more years after they first surfaced in failed software. Maybe, instead of focusing on software development failure, we should be focusing on recovering our memory of vanished ideas, and reducing the rate of quality-product failure.


When I was at Sloan School of Management at MIT in 1980, John Rockart presented a case study of a development failure that was notable for one outcome: instead of firing the manager of the project, the company rewarded him with responsibility for more projects. They recognized, in the software development arena, the value of his knowledge of how the project failed. In the same way, perhaps companies with software-infused products should begin to collect ideas, not present competitive ideas, but past ideas that are still applicable but have been overlooked. If not, we are doomed not to repeat history, but rather, and even worse, to miss the chance to make it come out better this time.