It seems as if I’m doing a lot of memorializing these days – first Sun, now Joseph Alsop, CEO of Progress Software since its founding 28 years ago. It’s strange to think that Progress started up shortly before Sun, but took an entirely different direction: SMBs (small-to-medium-sized businesses) instead of large enterprises, software instead of hardware. So many database software companies since that time that targeted large enterprises have been marginalized, destroyed, crowded out, or acquired by IBM, CA (acting, in Larry Ellison’s pithy phrase, as “the ecosystem’s needed scavenger”), and Oracle.
Let’s see, there’s IDMS, DATACOM-DB, Model 204, and ADABAS from the mainframe generation (although Cincom with TOTAL continues to prosper), and Ingres, Informix, and Sybase from the Unix-centered vendors. By contrast, Progress, FileMaker, iAnywhere (within Sybase), and Intersystems (if you view hospital consortiums as typically medium-scale) have lasted and have done reasonably well. Of all of those SMB-focused database and development-tool companies, judged in terms of revenues, Progress (at least until recently) has been the most successful. For that, Joe Alsop certainly deserves credit.
But you don’t last that long, even in the SMB “niche”, unless you keep establishing clear and valuable differentiation in customers’ minds. Looking back over my 16 years of covering Progress and Joe, I see three points at which Progress made a key change of strategy that turned out to be right and valuable to customers.
First, in the early ‘90s, they focused on high-level database-focused programming tools on top of their database. This was not an easy thing to do; some of the pioneers, like Forte (acquired by Sun) and PowerBuilder (acquired by Sybase), had superb technology that was difficult to adapt to new architectures like the Web and low-level languages like Java. But SMBs and SMB ISVs continue to testify to me that applications developed on Progress deliver SMB TCO and ROI superior to the Big Guys.
Second, they found the SMB ISV market before most if not all other ISVs. I still remember a remarkable series of ads shown in one of their industry analyst days featuring a small shop whose owner, moving as slow as molasses, managed to sell one product to one customer during the day – by instantly looking up price and inventory and placing the order using a Progress-ISV-supplied customized application. That was an extreme; but it captured Progress’ understanding that the way to SMBs’ hearts was no longer just directly or through VARs, but also through a growing cadre of highly regional and niche-focused SMB ISVs. By the time SaaS arrived and folks realized that SMB ISVs were particularly successful at it, Progress was in a perfect position to profit.
Third, they home-grew and took a leadership position in ESBs (Enterprise Service Buses). It has been a truism that SMBs lag in adoption of technology; but Progress’ ESB showed that SMBs and SMB vendors could take the lead when the product was low-maintenance and easily implemented – as opposed to the application servers large-enterprise vendors had been selling.
As a result of Joe Alsop and Progress, not to mention the mobile innovations of Terry Stepien and Sybase, the SMB market has become a very different place – one that delivers new technology to large enterprises as much as large-enterprise technology now “trickles down” to SMBs. The reason is that what was sauce for the SMB goose was also sauce for the workgroup and department in the large enterprise – if it could be a small enough investment to fly under the radar of corporate standards-enforcers. Slowly, many SMBs have grown into “small large” enterprises, and many workgroups/departments have persuaded divisions, lines of business, and even data centers in large enterprises to see the low-cost and rapid-implementation benefits of an SMB-focused product. Now, big vendors like IBM understand that they win with small and large customers by catering to the needs of regional ISVs instead of the enterprise-app suppliers like SAP and Oracle. Now, Progress does a lot of business with large enterprises, not just SMBs.
Running a company focused on SMB needs is always a high-wire act, with constant pressure on the installed base by large vendors selling “standards” and added features, lack of visibility leading customers to worry about your long-term viability (even after the SMB market did far better in the Internet bust than large-enterprise vendors like Sun!), and constant changes in the technology that bigger folk have greater resources to implement. To win in the long term, you have to be like Isaiah Berlin’s hedgehog – have one big unique idea, and keep coming up with a new one – to counter the large-vendor foxes, who win by amassing lots of smaller ideas. Many entrepreneurs have come up with one big idea in the SMB space; but Joe Alsop is among the few that have managed to identify and foster the next one, and the one after that. And he managed to do it while staying thin.
But perhaps the greatest testimony to Joe Alsop is that I do not have to see his exit from CEO-ship as part of the end of an era. With Sun, with CA as Charles Wang left, with Compuware, the bloom was clearly off the old business-model rose. Progress continues to matter, to innovate, and to be part of an increase in importance of the SMB market. In fact, this is a good opportunity to ask yourself, if you’re an IT shop, whether cloud computing means going to Google, Amazon, IBM, and the like, or the kind of SMB-ISV-focused architecture that Progress is cooking up. Joe Alsop is moving on; the SMB market lives long and prospers!
Tuesday, May 19, 2009
Monday, May 18, 2009
Classical Music and Economics
As someone who was serious about classical violin in my teenage years, and who has occasionally revisited the field of classical music since (due to my father’s passionate love for all periods classical), I rarely think of my profession or indeed other fields of study as related to classical music at all. Recently, however, I read Jeffrey Sachs’ Common Wealth and Alan Beattie’s False Economies, both fresh looks at seemingly well-established economic truths. For some reason, I then wondered if a fresh look at classical music from an economic point of view would yield new insights.
So I did a quick Google search on recent articles on the economics of classical music. What I found was a bit disturbing: over the past 10 years, the common theme of commentators was that classical music now depended on the continued patronage of rich donors and governments; and that the income of top classical artists was limited by their visibility to a narrow, high-income classical-music audience as mediated by record-company executives and concert-hall bookers. The reason this was unsettling was that I had heard almost exactly the same analysis 50 years ago. In the 1990s, an oboist-turned-journalist wrote an autobiography-cum-analysis called Sex, Drugs, and Mozart, in which she argued that funneling money primarily to orchestras was creating an untenable situation in which too many musicians were chasing too few dollars via patron and government funding of orchestras. I conclude that neither revival nor disaster has happened; instead, classical music has reached a “steady state” in which a lot of children are classical music performers, and from college on attention ceases; for grown-ups, classical music becomes a “symbol of class” that otherwise takes up less and less of the world’s attention.
From my experience, the sense of distance in the audience is palpable. Parents who attend their kids’ concerts, or business people who dress up to go to an orchestra concert, typically have no idea of what is “good” or “bad”; so they try to react as they feel they are supposed to, by feeling moved. But take away the social imperative, and they have little urge to keep attending. I, on the other hand, like to go back, because I like to argue with how the piece is performed: do I like Heifetz better in this phrase, or Joshua Bell? Rubinstein or Yo-Yo Ma? De los Angeles or Britney Spears (or Jacques Brel)? Will I ever again hear the amazing subtleties of the Brahms Piano Trios done by Stern-Istomin-Rose? Is this the time when one of them will finally play the Bach Chaconne the way it could be played – by me, if I were a performer?
How is this different from other arts, or “entertainment” in general? Consider pop music, or jazz. The economics of jazz are dreadful; but the settings for performing are generally intimate, and many of the audience dream they could be as good as the performers. Rock is more often large-audience or recordings, but connection with the audience is usually just as important as with jazz, and the surgeon in House can fantasize he can play solo with the band – we would have to go back to the “retro” character Charles in M*A*S*H to find a comparable person who could imagine being a solo pianist.
Or, consider competitive sports. It is certainly true that very, very few ever make it to the big leagues; and yet the audience for them is large, and growing. The common thread, here, is that children who play and watch sports can believably imagine themselves in the place of the superstars. People learn from games, and they fantasize about them. Competition is not necessarily the most important part of sports: learning from the excellent, even if it is collecting stats or admiring the way they look, is important too.
Looking at various parts of the entertainment industry and leisure-time consumers, this seems to be a good way of distinguishing “growing” and “mature” segments. Knitting or pottery are potentially participatory; surgery and math problem solution are generally not, although both may require equal amounts of skill for the best performers.
What this says to me is that classical music, economically speaking, does not have to be a backwater. What is required is that a large number of adults can be attracted to spectating, because, as spectators, they can imagine themselves as the performers – and they can bring their own ideas to the show.
If this is true, then many of the ideas about how to “revive” classical music are subtly but dangerously wrong-headed. The economic need, it is asserted, is to economize by focusing on large, cost-effective groupings of musicians, like the orchestra or the opera – make it bigger and snazzier. But these distance a particular performer from a particular part of the audience, by creating a situation in which few of the audience understand the subtlety of the way a solo performance differs from every other repetition of a phrase, by physically and emotionally distancing performer(s) and audience(s), and by limiting audience rules of interaction and fantasy to “end of every half hour applaud.” Going out into the schools or giving free concerts of the latest classical compositions are effectively beside the point: the one simply reinforces a classical-music presence in the schools that will be lost in college anyway, and the other is focusing on moving the audience to “new” classical music rather than engaging their attention in any kind of classical music performance.
There are limits, of course. Maintaining the patronage of orchestras and opera singers is necessary to keep matters at a “steady state”. Cheering during a movement misses the vital element of softness or silence within a piece, just as laughing during a Jack Benny pause may ruin the punch line. But there should be questions asked, challenges made: “Here’s what I’m trying to do here; listen for it, and decide if you like it better;” “Who’s your favorite rock star? how would he/she do the melody here, and would it be better/worse?” “As you listen here, which part do you find most beautiful/moving? How would you change it if you sang it to yourself?” “There are two ways of playing this: deeper/sadder, and louder/angrier. Which do you feel should be the overall message of the piece? How would you imagine yourself conveying that message if you were me?”
We will know whether such an effort is successful when (whatever the copyright implications) tracks from lots of performers are being passed around because they are different from everyone else, and the listener likes to sing along. In that case, just as in rock, the performer and composer will be of equal importance, and old standards performed differently by new generations will become not only valid but expected. And the audience of consumers, continuing on after college, will become a growing market as classical music finally captures the “long tail.”
So I did a quick Google search on recent articles on the economics of classical music. What I found was a bit disturbing: over the past 10 years, the common theme of commentators was that classical music now depended on the continued patronage of rich donors and governments; and that the income of top classical artists was limited by their visibility to a narrow, high-income classical-music audience as mediated by record-company executives and concert-hall bookers. The reason this was unsettling was that I had heard almost exactly the same analysis 50 years ago. In the 1990s, an oboist-turned-journalist wrote an autobiography-cum-analysis called Sex, Drugs, and Mozart, in which she argued that funneling money primarily to orchestras was creating an untenable situation in which too many musicians were chasing too few dollars via patron and government funding of orchestras. I conclude that neither revival nor disaster has happened; instead, classical music has reached a “steady state” in which a lot of children are classical music performers, and from college on attention ceases; for grown-ups, classical music becomes a “symbol of class” that otherwise takes up less and less of the world’s attention.
From my experience, the sense of distance in the audience is palpable. Parents who attend their kids’ concerts, or business people who dress up to go to an orchestra concert, typically have no idea of what is “good” or “bad”; so they try to react as they feel they are supposed to, by feeling moved. But take away the social imperative, and they have little urge to keep attending. I, on the other hand, like to go back, because I like to argue with how the piece is performed: do I like Heifetz better in this phrase, or Joshua Bell? Rubinstein or Yo-Yo Ma? De los Angeles or Britney Spears (or Jacques Brel)? Will I ever again hear the amazing subtleties of the Brahms Piano Trios done by Stern-Istomin-Rose? Is this the time when one of them will finally play the Bach Chaconne the way it could be played – by me, if I were a performer?
How is this different from other arts, or “entertainment” in general? Consider pop music, or jazz. The economics of jazz are dreadful; but the settings for performing are generally intimate, and many of the audience dream they could be as good as the performers. Rock is more often large-audience or recordings, but connection with the audience is usually just as important as with jazz, and the surgeon in House can fantasize he can play solo with the band – we would have to go back to the “retro” character Charles in M*A*S*H to find a comparable person who could imagine being a solo pianist.
Or, consider competitive sports. It is certainly true that very, very few ever make it to the big leagues; and yet the audience for them is large, and growing. The common thread, here, is that children who play and watch sports can believably imagine themselves in the place of the superstars. People learn from games, and they fantasize about them. Competition is not necessarily the most important part of sports: learning from the excellent, even if it is collecting stats or admiring the way they look, is important too.
Looking at various parts of the entertainment industry and leisure-time consumers, this seems to be a good way of distinguishing “growing” and “mature” segments. Knitting or pottery are potentially participatory; surgery and math problem solution are generally not, although both may require equal amounts of skill for the best performers.
What this says to me is that classical music, economically speaking, does not have to be a backwater. What is required is that a large number of adults can be attracted to spectating, because, as spectators, they can imagine themselves as the performers – and they can bring their own ideas to the show.
If this is true, then many of the ideas about how to “revive” classical music are subtly but dangerously wrong-headed. The economic need, it is asserted, is to economize by focusing on large, cost-effective groupings of musicians, like the orchestra or the opera – make it bigger and snazzier. But these distance a particular performer from a particular part of the audience, by creating a situation in which few of the audience understand the subtlety of the way a solo performance differs from every other repetition of a phrase, by physically and emotionally distancing performer(s) and audience(s), and by limiting audience rules of interaction and fantasy to “end of every half hour applaud.” Going out into the schools or giving free concerts of the latest classical compositions are effectively beside the point: the one simply reinforces a classical-music presence in the schools that will be lost in college anyway, and the other is focusing on moving the audience to “new” classical music rather than engaging their attention in any kind of classical music performance.
There are limits, of course. Maintaining the patronage of orchestras and opera singers is necessary to keep matters at a “steady state”. Cheering during a movement misses the vital element of softness or silence within a piece, just as laughing during a Jack Benny pause may ruin the punch line. But there should be questions asked, challenges made: “Here’s what I’m trying to do here; listen for it, and decide if you like it better;” “Who’s your favorite rock star? how would he/she do the melody here, and would it be better/worse?” “As you listen here, which part do you find most beautiful/moving? How would you change it if you sang it to yourself?” “There are two ways of playing this: deeper/sadder, and louder/angrier. Which do you feel should be the overall message of the piece? How would you imagine yourself conveying that message if you were me?”
We will know whether such an effort is successful when (whatever the copyright implications) tracks from lots of performers are being passed around because they are different from everyone else, and the listener likes to sing along. In that case, just as in rock, the performer and composer will be of equal importance, and old standards performed differently by new generations will become not only valid but expected. And the audience of consumers, continuing on after college, will become a growing market as classical music finally captures the “long tail.”
Sunday, May 3, 2009
TCO/ROI Methodology
I frequently receive questions about the TCO/ROI studies that I conduct, and in particular about the ways in which they differ from the typical studies that I see. Here’s a brief summary:
• I try to focus on narrower use cases. Frequently, this will involve a “typical” small business and/or a typical medium-sized business – 10-50 users at a single site, or 1000 users in a distributed configuration (50 sites in 50 states, 20 users at each site). I believe that this approach helps to identify situations in which a typical survey averaging over all sizes of company obscures the strengths of a solution for a particular customer need.
• I try to break down the numbers into categories that are simple and reflect the user’s point of view. I vary these categories slightly according to the type of user (IT, ISV/VAR). Thus, for example, for IT users I typically break TCO down into license costs, development/installation costs, upgrade costs, administration costs, and support/maintenance contract costs. I think that these tend to be more meaningful to users than, say, “vendor” and “operational” costs.
• In my ROI computation, I include “opportunity cost savings”, and use a what-if number based on organization size for revenues, rather than attempting to determine revenues ex ante. Opportunity cost savings are estimated as TCO cost savings of a solution (compared to “doing nothing”) reinvested in a project with a 30% ROI. Considering opportunity cost savings gives a more complete picture of (typically 3-year) ROI. Comparing ROIs when revenues are equal allows the user to zero in on how faster implementation and better TCO translate into better profits.
• My numbers are more strongly based on qualitative data from in-depth, open-ended user interviews. Open-ended means that the interviewee is asked to “tell a story” rather than answer “choose among” and “on a scale of” questions, thus giving the interviewee every opportunity to point out flaws in initial research assumptions. I have typically found that a few such interviews yield numbers that are as accurate as, if not more accurate than, 100-respondent surveys.
Let me now, at the end of this summary, dwell for a moment on the advantages of open-ended user interviews. They allow me to focus on a narrower set of use cases without worrying as much about smaller survey size. By avoiding constraining the user to a narrow set of answers, they make sure that I am getting accurate data, and all the data that I need. They allow me to fine-tune and correct the survey as I go along. They surface key facts not anticipated in the survey design. They motivate the interviewee, and encourage greater honesty – everyone likes to “tell their story.” They also provide additional advice to other users – advice of high value to readers that gives additional credibility to the study conclusions.
• I try to focus on narrower use cases. Frequently, this will involve a “typical” small business and/or a typical medium-sized business – 10-50 users at a single site, or 1000 users in a distributed configuration (50 sites in 50 states, 20 users at each site). I believe that this approach helps to identify situations in which a typical survey averaging over all sizes of company obscures the strengths of a solution for a particular customer need.
• I try to break down the numbers into categories that are simple and reflect the user’s point of view. I vary these categories slightly according to the type of user (IT, ISV/VAR). Thus, for example, for IT users I typically break TCO down into license costs, development/installation costs, upgrade costs, administration costs, and support/maintenance contract costs. I think that these tend to be more meaningful to users than, say, “vendor” and “operational” costs.
• In my ROI computation, I include “opportunity cost savings”, and use a what-if number based on organization size for revenues, rather than attempting to determine revenues ex ante. Opportunity cost savings are estimated as TCO cost savings of a solution (compared to “doing nothing”) reinvested in a project with a 30% ROI. Considering opportunity cost savings gives a more complete picture of (typically 3-year) ROI. Comparing ROIs when revenues are equal allows the user to zero in on how faster implementation and better TCO translate into better profits.
• My numbers are more strongly based on qualitative data from in-depth, open-ended user interviews. Open-ended means that the interviewee is asked to “tell a story” rather than answer “choose among” and “on a scale of” questions, thus giving the interviewee every opportunity to point out flaws in initial research assumptions. I have typically found that a few such interviews yield numbers that are as accurate as, if not more accurate than, 100-respondent surveys.
Let me now, at the end of this summary, dwell for a moment on the advantages of open-ended user interviews. They allow me to focus on a narrower set of use cases without worrying as much about smaller survey size. By avoiding constraining the user to a narrow set of answers, they make sure that I am getting accurate data, and all the data that I need. They allow me to fine-tune and correct the survey as I go along. They surface key facts not anticipated in the survey design. They motivate the interviewee, and encourage greater honesty – everyone likes to “tell their story.” They also provide additional advice to other users – advice of high value to readers that gives additional credibility to the study conclusions.
Saturday, May 2, 2009
Moore's Law Is Dead, Long Live Tanenbaum's Law
Yesterday, I had a very interesting conversation with Mike Hoskins of Pervasive about his company’s innovative DataRush product. But this blog post isn’t about DataRush; it’s about the trends in the computer industry that I think DataRush helps reveal. Specifically, it’s about why, despite the fact that disks remain much slower than main memory, most processes, even those involving terabytes of data, are CPU-bound, not I/O-bound.
Mike suggested, iirc, that around 2006 Moore’s Law – in which every 2 years, approximately, the bit capacity of a computer chip doubled, and therefore processor speed correspondingly increased – began to break down. As a result, software written to assume that increasing processor speed would cover all programming sins against performance – e.g., data lockup by security programs when you start up your PC -- is now beginning to break down, as inevitable scaling of demands on the program are not met by scaling of program performance.
However, thinking about the way in which DataRush, or Vertica, achieve higher performance – in the first case by achieving higher parallelism within a process, in the second case by slicing relational data by columns of same-type data instead of rows of different-sized data – suggests to me that more is going on than just “software doesn’t scale any more.” At the very high end of the database market, which I follow, the software munching on massive amounts of data has been unable to keep up with disk I/O for the last 15 years, at least.
Thinking about CPU processing versus I/O, in turn, reminded me of Andrew Tanenbaum, the author of great textbooks on Structured Computer Organization and Computer Networks in the late 1970s and 1980s. Specifically, in one of his later works, he asserted that the speed of networks was growing faster than the speed of processors. Let me restate that as a Law: the speed of data in motion grows faster than the speed of computing on data at rest.
The implications of Tanenbaum’s Law and the death of Moore’s Law are, I believe, that most computing will be, for the foreseeable future, CPU-bound. Think of it in terms of huge query processing that reviews multiple terabytes of data. Data storage grows by 60% a year, and we would anticipate that the time to get a certain percent of that data off the disk to send to main memory would be greater each year, if networking speed was growing as fast as processor speed, and therefore slower than stored data. Instead, even today’s basic SATA drives can deliver multiple gigabytes/second – faster than the clock speeds of today’s microprocessors. To me, this says that disks are shoving the data at processors faster than they can process it. And the death of Moore’s Law just makes things worse.
The implications are that the fundamental barriers to scaling computing are not processor geometry, but the ability to parallelize the two key “at rest” tasks of the processor: storing the data in main memory, and operating on it. In order to catch up to storage growth and network speed growth, we have to throw as many processors as we can at a task in parallel. And that, in turn, suggests that the data-flow architecture needs to be looked at again.
The concept of today’s architecture is multiple processors running multiple processes in parallel, each process operating on a mass of (sometimes shared) data. The idea of the data-flow architecture is to split processes into unitary tasks, and then flow parallel streams of data under processors which carry out each of those tasks. The distinction here is that in one approach, the focus is in parallelizing multi-task processes that the computer carries out on a chunk of data at rest; in the other the focus is on parallelizing the same task carried out on a stream of data.
Imagine, for instance, that we were trying to find the best salesperson in the company in the last month, with a huge sales database not already prepared for the query. In today’s approach, one process would load the sales records into main memory in chunks, and for each chunk, maintain a running count of sales for every salesman in the company. Yes, the running count is to some extent parallelized. But the record processing is often not.
Now imagine that multiple processors are assigned the task of looking at each record as it arrives, with each processor keeping a running count for one salesperson. Not only are we speeding up the access to the data uploaded from disk by parallelizing that; we are also speeding up the computation of running counts beyond that of today’s architecture, by having multiple processors performing the count on multiple records at the same time. So the two key bottlenecks involving data at rest – accessing the data, and performing operations on the data – are lessened.
Note also that the immediate response to the death of Moore’s Law is the proliferation of multi-core chips – effectively, 4-8 processors on a chip. So a simple way of imposing a data-flow architecture over today’s approach is to have the job scheduler in a symmetric multiprocessing architecture break down processes into unitary tasks, then fire up multiple cores for each task, operating on shared memory. If I understand Mike Hoskins, this is the gist of DataRush’s approach.
But I would argue that if I am correct, programmers also need to begin to think of their programs as optimizing processing of data flows. One could say that event-driven programming does something similar; but so far, that’s typically a special case, not an all-purpose methodology or tool.
Recently, to my frustration, a careless comment got me embroiled again in the question of whether Java or Ruby or whatever is a high-level language – when I strongly feel that these do poorly (if examples on Wikipedia are representative) at abstracting data-management operations and therefore are far from ideal. Not one of today’s popular dynamic, functional, or object-oriented programming languages, as far as I can tell, thinks about optimizing data flow. Is it time to merge them with LabVIEW or VEE?
Mike suggested, iirc, that around 2006 Moore’s Law – in which every 2 years, approximately, the bit capacity of a computer chip doubled, and therefore processor speed correspondingly increased – began to break down. As a result, software written to assume that increasing processor speed would cover all programming sins against performance – e.g., data lockup by security programs when you start up your PC -- is now beginning to break down, as inevitable scaling of demands on the program are not met by scaling of program performance.
However, thinking about the way in which DataRush, or Vertica, achieve higher performance – in the first case by achieving higher parallelism within a process, in the second case by slicing relational data by columns of same-type data instead of rows of different-sized data – suggests to me that more is going on than just “software doesn’t scale any more.” At the very high end of the database market, which I follow, the software munching on massive amounts of data has been unable to keep up with disk I/O for the last 15 years, at least.
Thinking about CPU processing versus I/O, in turn, reminded me of Andrew Tanenbaum, the author of great textbooks on Structured Computer Organization and Computer Networks in the late 1970s and 1980s. Specifically, in one of his later works, he asserted that the speed of networks was growing faster than the speed of processors. Let me restate that as a Law: the speed of data in motion grows faster than the speed of computing on data at rest.
The implications of Tanenbaum’s Law and the death of Moore’s Law are, I believe, that most computing will be, for the foreseeable future, CPU-bound. Think of it in terms of huge query processing that reviews multiple terabytes of data. Data storage grows by 60% a year, and we would anticipate that the time to get a certain percent of that data off the disk to send to main memory would be greater each year, if networking speed was growing as fast as processor speed, and therefore slower than stored data. Instead, even today’s basic SATA drives can deliver multiple gigabytes/second – faster than the clock speeds of today’s microprocessors. To me, this says that disks are shoving the data at processors faster than they can process it. And the death of Moore’s Law just makes things worse.
The implications are that the fundamental barriers to scaling computing are not processor geometry, but the ability to parallelize the two key “at rest” tasks of the processor: storing the data in main memory, and operating on it. In order to catch up to storage growth and network speed growth, we have to throw as many processors as we can at a task in parallel. And that, in turn, suggests that the data-flow architecture needs to be looked at again.
The concept of today’s architecture is multiple processors running multiple processes in parallel, each process operating on a mass of (sometimes shared) data. The idea of the data-flow architecture is to split processes into unitary tasks, and then flow parallel streams of data under processors which carry out each of those tasks. The distinction here is that in one approach, the focus is in parallelizing multi-task processes that the computer carries out on a chunk of data at rest; in the other the focus is on parallelizing the same task carried out on a stream of data.
Imagine, for instance, that we were trying to find the best salesperson in the company in the last month, with a huge sales database not already prepared for the query. In today’s approach, one process would load the sales records into main memory in chunks, and for each chunk, maintain a running count of sales for every salesman in the company. Yes, the running count is to some extent parallelized. But the record processing is often not.
Now imagine that multiple processors are assigned the task of looking at each record as it arrives, with each processor keeping a running count for one salesperson. Not only are we speeding up the access to the data uploaded from disk by parallelizing that; we are also speeding up the computation of running counts beyond that of today’s architecture, by having multiple processors performing the count on multiple records at the same time. So the two key bottlenecks involving data at rest – accessing the data, and performing operations on the data – are lessened.
Note also that the immediate response to the death of Moore’s Law is the proliferation of multi-core chips – effectively, 4-8 processors on a chip. So a simple way of imposing a data-flow architecture over today’s approach is to have the job scheduler in a symmetric multiprocessing architecture break down processes into unitary tasks, then fire up multiple cores for each task, operating on shared memory. If I understand Mike Hoskins, this is the gist of DataRush’s approach.
But I would argue that if I am correct, programmers also need to begin to think of their programs as optimizing processing of data flows. One could say that event-driven programming does something similar; but so far, that’s typically a special case, not an all-purpose methodology or tool.
Recently, to my frustration, a careless comment got me embroiled again in the question of whether Java or Ruby or whatever is a high-level language – when I strongly feel that these do poorly (if examples on Wikipedia are representative) at abstracting data-management operations and therefore are far from ideal. Not one of today’s popular dynamic, functional, or object-oriented programming languages, as far as I can tell, thinks about optimizing data flow. Is it time to merge them with LabVIEW or VEE?
Subscribe to:
Posts (Atom)