Tuesday, November 19, 2013

MIni-Post: The Past Isn't Even Past -- Unless You Use "Proactive Analytics"

I'm referring here to a quote from William Faulkner:  "The past isn't dead.  It isn't even past."  More specifically, I am noting the degree to which past mistakes in dealing with the customer get embedded in the organization, and hence repeated, leading to greater and greater aggravation that eventually results in a divorce over what seems to the organization to be the most trivial of customer pretexts.

I was reminded of this quote when I visited one of the demo booths at IOD 2013 and found a fascinating example of the application of artificial intelligence to analytics -- so-called "proactive analytics."  Essentially, the app combs through past data to see if it's consistent, or whether some of it is possibly wrong, and then comes up with a suggested change to the data or metadata (as in, a change to one's model of the customer) to fit the amended data.

To me, this is one of the good things about artificial intelligence, as opposed to the "knowledge-base" approach that is somehow supposed to produce human-equivalent intelligence (and about which I have always been a bit skeptical).  One of the key original insights of AI was to divide reasoning into logical (where there is an adequate base of facts to establish a good rule about what to do) and intuitive (where facts are incomplete or unclear, so you have to muddle through to a rule) processes. In intuitive situations, AI found, the quickest way to a rule that gets closest to the underlying reality is to focus on the data that didn't fit, rather than focusing on data that seems to confirm an initial hypothesis.  And so, proactive analytics apparently targets this highly useful AI insight, focusing on what doesn't fit and thereby iteratively coming up with a much better model of the customer or the process.

All this is abstract; but then the demo person came up with an example that every company should drool over;  combing through customer records and detect deceased, moved or terminated customers.  And to me as a customer, this holds out the real possibility that I will no longer get sales literature about an estate-property-sale LLC long since terminated, letters addressed to "Wayne Keywochew, Aberdeen Group" (a stint that ended 9 years ago), or endless magazine subscription offers because of something I once ordered.  Multiply me by millions, and you have an incredible upgrade in your customer relations. 

Monday, November 11, 2013

James Hansen’s Climate Change Magnum Opus: Unsurpassed Horror and Sad Beauty

I warn you that this description of the latest draft paper by James Hansen and others should horrify you, if you are sane.  After Joe Romm first published its sound bite (30 degrees Fahrenheit increase if most fossil fuels are burned, 50 degrees at higher latitudes), I delayed reading it in detail. Now that I have, I find it builds on his (and others’) 40 years of work in the area and the latest research to provide an up-to-date climate change model whose implications are mostly more alarming than any I have seen elsewhere. 

What follows is my layperson’s attempt to summarize and draw further conclusions. I scant the discussion of new analyses of previous episodes of global warming (and cooling) that allow the development of the new model, focusing instead on the mechanisms and implications of the model.  Please note that, afaik, this is the first model that attempts to fully include the effects of methane and permafrost melting.

It’s About CO2

The first insight in the new model I summarize as follows:
As atmospheric CO2 increases or decreases, global average temperature increases or decreases proportionally, with a lag either way typically of a few decades.
This increase or decrease can be broken down into three parts:
1.       The immediate effect of the CO2 itself -- perhaps 60% of the total effect.

2.       The immediate and “over a few decades” effect of other “greenhouse gases” , or GHGs (here we are talking particularly about the methane in permafrost and methane hydrates on the continental shelves, released by warming, as well as GHGs such as nitrous oxide) – perhaps 20% of the total effect.

3.       The so-called “fast feedback” effects, in which the released CO2 and other factors (e.g., increased albedo) lead to additional warming “over a few decades”.
Two quick notes:  First, Hansen does not do my split; instead, he distinguishes between the effects of CO2 and the effects of other GHGs over the medium term (about 75-25) and then separately distinguishes between the immediate overall “climate sensitivity” and the medium-term or total “climate sensitivity” (again, about 75 % immediately and 100 % in the long term).  Second, the “over a few decades” is my interpretation of how quickly “X times CO2” seems to match global temperature data over more recent sets of data.  Hansen might very well say that this may or may not occur quite this rapidly, but it doesn’t matter to him because even with a thousand-year time frame for the full effect, CO2 will not be recycled out of the atmosphere for “few thousand years”, so we still reach the full “climate sensitivity”.
Just to get the usual objections out of the way, Hansen is not saying that CO2 always leads the way – on the contrary, in Ice Age scenarios in which a certain point in a Milankovitch cycle causes extreme winters in our northern hemisphere, leading to increased glaciation and therefore decreased albedo and CO2 release to the atmosphere, CO2 follows other factors.  Today, however, primarily because of fossil-fuel emissions, CO2 is leading the way.
A sub-finding, still important, is that there is a linear relationship (again, sometimes with a lag of decades) between deep-ocean temperature change and atmospheric temperature change (expressed as “the change in temp at the surface is somewhere between 1.5 and 2.5 times the change in temp of the deep ocean” – or, about 67% of the global temperature increase goes into surface temps, 33% into deep ocean temps).  I include this because it seems that the recent “slowdown” in global surface temperature ascent is primarily caused by increased accumulation in the deep ocean.  However, again in a relatively short time frame, we should go back to more rapid average global surface temperature increases, because we’re still increasing atmospheric CO2 rapidly and 2/3 of that will start again going back into surface temps.

The Effect of “CO2 Plus” Is Bigger Than We Thought

In the past, Hansen among others has seen the effect of doubled CO2 as somewhere in the 2-3 degrees Celsius range.  Now, he sees a range of 3-4 degrees C – apparently, primarily because he now takes into account “other GHGs”.  To put it more pointedly, in my own interpretation:
Each doubling of CO2 leads to a global temperature change of 2.25-3 degrees Celsius (4-5.4 degrees F) “over a few decades”, and to a change of 3-4 degrees C (5.4-7.2 degrees F) “over 1 or 2 centuries.”
I mention this not only because the consequences of today’s global warming are more dire than we thought (i.e., the effects of that warming, immediately and over the next century or two), but also because many of us are still hung up over that “stop emissions and hold the increase to 2 degrees C” target that was the main topic at recent global governmental summits.  The atmospheric CO2 level at the beginning of the Industrial Revolution was about 250 parts per million (ppm), and is now at about 400 ppm.  If you do the math, that means we have baked in at least 2.2-3 degrees C of global temperature increase already. After 15 years of inaction, that target now has zero chance of success.
At this point, I want to do a shout-out to those wonderful folks at the Arctic Sea Ice blog and forum.  Hansen specifically notes the data supporting melting of Arctic sea ice, plus collapse of the Greenland and West Antarctic ice sheets, at levels slightly below today’s CO2.  He also notes data supporting the idea that Greenland and West Antarctica can go pretty rapidly, “in a few centuries”, iirc – I interpret “in a few centuries” as within 250-450 years from now.

The Percent of Fossil Fuels We Need To Leave In The Ground Forever Is Greater Than We Thought

Before I get to the consequences if we don’t leave a percentage of fossil fuels in the ground, let’s see how the minimum amount of fossil fuels burned before we reach “worst consequences” has changed. Today’s estimate of total recoverable fossil-fuel reserves (coal, oil [primarily tar sands and oil shale], and natural gas) is about the equivalent of 15,000 Gt C (billions of tons of carbon emitted).  Of this, coal is about 7.3-11 Gt C, and the rest is split approximately equivalently between natural gas and tar sands/oil shale. Originally, we thought that burning 10,000 Gt C in the next century would get us to “worst consequences”.  Now, Hansen places the correct amount as somewhere between 5,000 Gt C and 10,000 Gt C.  Reading between the lines, I am placing the range as 6,000-7,000 Gt C, with 5,000 Gt C if we want to be ultra-safe, and I’m estimating coal as 60% of the emittable total, 20% tar sands/oil shale/oil, 20% natural gas.  Note, btw, that according to Hansen fossil-fuel emissions have increased consistently by about 3 % per year since 1950, including last year. At that rate, we’d reach 6,000-7,000 Gt C in about 65-70 years.

Again, note that Hansen breaks the fossil fuels down as coal, traditional oil/gas, and oil shale/tar sands/fracked gas, so I’m guesstimating the equivalents.

So here’s the way it works out:
If we burn all the coal plus a very minor amount of everything else, we reach “worst consequences.”
If we burn all of everything but coal and 33% of the coal, we reach “worst consequences”.
If we burn 17% of the coal, 50% of the natural gas, and all the tar sands/oil shale/oil, we reach “worst consequences”.
So this, imho, is why I agree with Hansen that allowing the Keystone XL pipeline is “game over” for the climate, as in “worst consequences almost inevitable”.  The Keystone XL pipeline is a “gateway drug” for tar sands and oil shale.  The source (Alberta, Canada) has a large part of the known tar sands oil, and presents similar difficulties in abstracting and processing to oil shale.  It’s the furthest along in terms of entering the world market.  If that source succeeds, as the saying goes, once the nose of the camel is in the tent, you may expect the rest of the camel to enter.  In this case, if Alberta succeeds in getting the Keystone XL pipeline, it is probably the case that most of the tar sands and oil shale will be used; if not, probably not.
Right now, Alberta has no real buyers except the US, and the US is not set up to accept the oil, nor Canada to ship it to them in bulk.  The pipeline would effectively create an infrastructure to ship it, primarily to the rest of the world, which presumably would accept it – especially China – creating a market that allows Alberta profitability.  Alternatives are much more costly, are susceptible to pressure from the US, and would probably not be undertaken at all.  Note that increased shipment via truck is more costly, and would probably require major investments in truck structure, to handle the more toxic tar-sands crude, so that it is probably not a large-scale alternative that would make the project a success.  Likewise, trains and tracks to the Canadian ports to ship directly to world markets would probably prove too costly.
Now go back to the model.  It’s pretty darn likely we’ll burn 17% of the coal no matter what, and the majority of the natural gas.  Now add the tar sands and oil shale.  Worst consequences, here we come.
The Worst Is Likelier Than We Thought, Arrives Sooner, Is Almost As Bad As Our Worst Nightmare, And Is More Inescapable Once We Get There Than We Hoped
We’ve already dealt with “likelier than we thought”, and we can guess from the rapidity of response to atmospheric CO2 rise and the increase in the estimated climate sensitivity to atmospheric CO2 that it arrives sooner than we had projected. But what is this “worst consequences almost as bad as our worst nightmare”, and “worst consequences, once arrived, more inescapable that we hoped”?
For us, the worst consequences are not “snowball Earth”, locked in eternal ice, but “runaway GHG Earth” a la Venus, with the surface and air too hot and too acid to support water or any life at all (water vapor in the atmosphere vaporizes from the heat long before it reaches the surface).  It’s an inescapable condition, since once the atmosphere locks in the heat, the Sun’s heat from outside trapped by the CO2 and other gases in the atmosphere balances escaping heat from the troposphere (top of the atmosphere).  Hansen’s model shows that we are still 100 million to 1 billion years from being able to reach that state, even by burning all fossil fuels in a gigantic funeral pyre. 
The worst consequence, as cited before, is therefore as cited at the very beginning, Joe Romm’s sound bite:  30 degrees F increase globally, 50 degrees in the high latitudes.  Here’s Hansen’s take on what that means:  it will take all areas of the Earth except the mountains above 35 degrees C “wet bulb temperature” during their summers.  That in turn, according to Hansen, would mean the following: 
In the worst-consequence world, humans could survive below the mountains during the day outside only for short periods of time during the summer, and there would be few if any places to grow grains. 
Effectively, most areas of the globe would be Death Valley-like or worse, at least during the summer.
Here I think Hansen, because he properly doesn’t diverge into movement polewards of weather patterns and the effects of high water and possible toxic blooms, underestimates the threat to humanity’s survival.  Recent research suggests that with global warming, tropic climates stretch northwards.  Thus, projections for the US (not to mention Europe below Scandinavia, Australia, southern Africa, and southern Russia) is for extreme drought.  How can this be, when there will be lots of increased water vapor in the air?  Answer: it will be rare in falling, and far more massive and violent when it does.  The heat will bake the ground hard, so that when it does rain, the rain will merely bounce off the ground and run off (with possible erosion), rather than irrigating anything.  Add depletion of aquifers and of ice-pack runoff, and it will be very hard to grow anything (I suppose, mountains partially excepted) below Siberia, northern Canada/Alaska, and Scandinavia. 
However, these have their own problems:  rains too massive (and violent) to support large-scale agriculture – which is why you don’t see farming on Seattle’s Olympic Peninsula.  The only “moderate-rainfall” areas projected as of now, away from the sea and the equator, are a strip in northern Canada, one in northern Argentina, one in Siberia, and possibly one in Manchuria. Most of this land is permafrost right now.  To even start farming there would require waiting until the permafrost melts, and moving in the meantime to “intermediate” farming areas.  Two moves, minimal farmland, and greater challenges from violent weather.  Oh, and if you want to turn to hunting you’ll be lucky if you have an ecosystem that supports top-level meat animals, not to mention the 90% of plant and animal species that will likely be extinct by then. As for the ocean, forget about it as a food source, unless you like jellyfish (according to research done for the UN recently).
In my version of Hansen's worst-consequence world, we would try to survive on less than 10 % of today's farmland, less than 10% of the animal and vegetable species with disrupted ecosystems, and practically zero edible ocean species, in territory that must be developed before it is usable, in dangerous weather, for thousands of years.
Hansen notes that one effective animal evolutionary response to past heat episodes has been hereditary dwarfism.  Or, as I like to think about it, we could all become hobbits.  However, because we are heading towards this excessive heat much faster than in those times, we can’t evolve fast enough; so that’s out.
What about inescapable?  Well, according to Hansen, CO2 levels would not get out of what he calls the “moderately moist greenhouse” area for thousands of years, and would not reach close to where we are now until 10,000-100,000 years hence.  By which time, not only will we be dead, but most of humanity, if not all.
Now, I had feared the Venus scenario, so the worst consequences are not as bad as I thought.  However, the increased estimate for temperatures in the moderately moist greenhouse and the wet bulb temperature consideration makes the next-worst scenario more likely than before to end humanity altogether. 

Snowball Earth:  Sad Beauty of a Sidelight

Having said all this, Hansen at least gives a beautiful analysis of why we don’t wind up a “snowball Earth” (the opposite scenario from a “runaway greenhouse”).  He notes that once the Earth is covered with ice, carbon can’t be recycled to the Earth via “weathering” (absorption from the atmosphere by rocks whose surfaces are abraded by wind and water).  So volcanic emissions and the like put more and more carbon dioxide in the atmosphere, until the temperature warms up enough and melting of the ice begins.  Apparently, evidence suggests that this may have happened once or twice in the past, when the Sun was delivering less light and hence heat.
The usual caveats apply.  Primarily, they fall in the category of “I was reading Hansen out of fear, and so I may be stretching the outer limits of what may happen, just as Hansen may be understating out of scientific conservatism.”  Make up your own mind.

I am reminded of a British Beyond the Fringe comedy skit about WW II, suitably amended:

“Go up in the air, carbon. Don’t come back.”
“Goodbye, sir.  Or perhaps it’s ‘au revoir’?”
“No, carbon.”
And what will it take for humanity to really start listening to Hansen, and to the science?


Sunday, November 10, 2013

Mini-Post -- IBM IOD -- Consider Informix For Time

One more IOD mini-post, and then I'm out of time (pun intended).

It was nice to see the Informix folks again, and for all their past database innovation I owe them some mention.  They introduced object-relational, among other things, plus "blades" to provide specific performance speed-up for particular querying tasks, and they even (although I never saw it implemented) proposed a relational feature that eliminated the sort step.

In today's IBM, their impact is necessarily muffled, although they still have the same virtues they always had:  An indirect channel untouchable by Oracle that services the prevalent medium-sized businesses of Germany and the new SMBs of China, among others; mature object-relational and relational technologies with scalability in certain query types, and with administrative simplicity relative to the Oracles of the world; and mastery of time-series data.  As I understand it, their near-term plans involve attempting to expand these strengths, e.g., by providing "edge" Web streaming-data processing for SMBs.

From my viewpoint, fairly or unfairly, the place the general user public ought to pay attention to Informix is in handling time-series data.  In recent years (again, this is my imperfect understanding) Informix has expanded its intertemporal "blade" technology to provide compression of (and hence very rapid querying of) time-series data via a process very similar to that of storage companies' dedupe "windows" -- taking a certain time frame, noting that only changes need to be stored given and original value, and hence compressing the time series data by (in storage's case, and probably in Informix's) 70-90%. I should also note that Informix can take advantage of IBM DB2 BLU Acceleration to further compress and speed querying of 1-to-2-column-per-query time-series data -- it's an earlier version of BLU, apparently, but one tuned more, and Informix provides a simple "fork" when it's time to go columnar.

How would I suggest any enterprise -- not just present Informix customers -- could use this time capability?  Well, let's face it, more and more data from the Web has part of its value as assessing behavior over time -- social network analysis, buying patterns, sensor data, the buying process. For this type of analytics, I would argue that neither the streams processor up front (can't store enough historical data) nor the data warehouse (not optimized for time-series data) is the best fit.  Rather, I would suggest a "mart" running Informix parallel to the main data warehouse, storing only time-series data, with a common front-end for querying -- somewhat akin to what I have seen called an "operational data store", or ODS (I tend to use the term ODS for something quite different, but that's a whole other conversation).

This would be of value not merely because it provides better performance on queries on time-series data.  Let's face it, time-series data has up to now not been considered worthy of separate consideration by today's enterprises.  And yet, our understanding of the customer should be far more enriched by understanding of customer processes and changing customers than it has been.  Creating such an Informix ODS would at least start the IT-business-exec conversation.

Just a thought ... and hopefully a timely one :)  Ttfn.

Mini-Post -- IBM IOD -- The Customer-Driven Big Data Enterprise

I have been challenged by several people as follows:  If you think that users are not spending as much as they should on Big Data because they don't see it as a process, but rather as a series of one-shot "big value-add insights", what then is the process they should create?  I don't pretend to have all the answers, but here, off the cuff, are my thoughts.

My first reflex in answering these questions is to recommend something "agile" (my definition, not the usual marketing hype).  However, in today's un-agile enterprise, and particularly dealing with un-agile senior executives, that won't work.  Btw, there's a wonderful phrase for this kind of problem, that I credit to Jim Ewel of Agile Manifesto fame:  HIPPO.  It stands for the HIghest-Paid Person in the rOom, and it refers to the tendency of decision-making to be made according to the market beliefs of the HIPPO rather than customer data. Still, I believe something can be salvaged from agile marketing practices in answering the question -- the idea of being customer-data-driven.

Next, I assert that the process should have a Big Data information architecture aimed at supporting it. If we are to use Big Data in gaining customer insights, then our architecture should allow access to and support integration of the three types of Big Data sources:  Loosely, (1) sensor-driven (real-time streams of data from Web sensors such as GPS tracking and smartphone video), (2) social-media (the usual Facebook/Pinterest sources of customer interest/interaction unstructured data), and (3) the traditional inhaled in-house data that tends to show up in the data warehouse or data marts.

The process itself would be one of iterative deeper understanding of the customer, equal parts understanding the customer as he/she is now (buying procedures/patterns plus how to chunk the present and potential parts of the market) and where he/she is going (changes in buying behavior, new technologies/customer interests, evolution of present changes via predictive analytics carefully applied -- because agile marketing tells us there's a danger in uncritical over-application of predictive analytics).  The process would be one of rapid iteration of customer-focused, Big-Data using insights, typically by data scientists, often feeding the CMO and marketing first, as befits the increased importance in today's large enterprise of the CMO.

What I suggest you have in this process is a Big-Data-focused, customer-insight-driven enterprise-driving analytical process.  Or, for short, the "customer-driven enterprise."  As in, Big Data for the customer-driven enterprise.

Mini-Post -- IBM IOD -- DB2 BLU Acceleration Revolution

Rounding out my quickies from the IBM Information on Demand conference ... I note from conversations with BLU experts that they are seeing "2-5 times acceleration" via tuning over the initial release; and I would expect that acceleration to show up in a new release some time in the next 3-6 months.  I did expect a performance boost, although more in the 50%-100% range.  If form holds true, we should see another performance boost through customization for particular workload types, maybe of 50-100%, conservatively.  Total at some point in the next 9 months, at a guess:  3-10 times acceleration.  Or, an order of magnitude on top of an order of magnitude, leading to 2 orders of magnitude compared to the obvious competition.

When are people going to admit that this is not just another database technology?

Tuesday, November 5, 2013

Mini-Post -- IBM IOD -- Data Scientist

This is the first in a series of mini-posts based on what I’ve been hearing at IBM IOD.  They are mini-posts because there are too many thoughts here worth at least mentioning, and hence no time to develop the thoughts fully.

One key difference with past vendor data-related presentations is the prominence of the “data scientist.”  I wish folks hadn’t chosen that term; I find it confuses more than it enlightens, giving a flavor of scientific rigor, data governance, and above all emphasis on unmassaged, unenlightening data.  Rather, I see the “data scientist” more as an analyst using Big Data to generate company-valuable informational insights iteratively, building on the last insight – “information analyticist” for short.  Still, it appears we’re stuck with “data scientist”.

The reason I think users ought to pay attention to the data scientist is that in business terms, he or she is the equivalent of the agile developer for information leveraging.  The typical data scientist, as presented in past studies, goes out and whips up analysis after analysis to pursue cost-cutting or customer-insight-using insights.  This is particularly useful to the CMO, who is now much more aware of the need to understand the customer better and get the organization in sync with company strategy – because they are often entirely unmotivated to do so now as a result of cost-cutting focuses.

Effectively, a focus on the data scientist as the spearpoint of a Big Data strategy ensures that such a strategy is far more likely to be successful, because it will be based on the latest customer data rather than senior executive opinion.  If vendors truly want Big Data to be successful, the data scientist role in an organization is one that they and the firms themselves badly need to encourage.