Monday, September 19, 2016

HTAP: An Important And Useful New Acronym

Earlier this year, participants at the second In-Memory Summit frequently referred to a new marketing term for data processing in the new architectures:  HTAP, or Hybrid Transactional-Analytical Processing.  That is, “transactional” (typically update-heavy) and “analytical” (typically read-heavy) handling of user requests are thought of as loosely coupled, with each database engine somewhat optimized for cross-node, networked operations. 

Now, in the past I have been extremely skeptical of such marketing-driven “new acronym coinage,” as it has typically had underappreciated negative consequences.  There was, for example, the change from “database management system” to “database”, which has caused unending confusion about when one is referring to the system that manages and gives access to the data, and when one is referring to the store of data being accessed.  Likewise, the PC notion of “desktop” has meant that most end users assume that information stored on a PC is just a bunch of files scattered across the top of a desk – even “file cabinet” would be better at getting end users to organize their personal data.  So what do I think about this latest distortion of the previous meaning of “transactional” and “analytical”?

Actually, I’m for it.

Using an Acronym to Drive Database Technology

I like the term for two reasons:

1.       It frees us from confusing and outdated terminology, and

2.       It points us in the direction that database technology should be heading in the near future.

Let’s take the term “transactional”.  Originally, most database operations were heavy on the updates and corresponded to a business transaction that changed the “state” of the business:  a product sale, for example, reflected in the general ledger of business accounting. However, in the early 1990s, pioneers such as Red Brick Warehouse realized that there was a place for databases that specialized in “read” operations, and that functional area corresponded to “rolling up” and publishing financials, or “reporting”.  In the late 1990s, analyzing that reporting data and detecting problems were added to the functions of this separate “read-only” area, resulting in Business Intelligence, or BI (similar to military intelligence) suites with a read-only database at the bottom.  Finally, in the early 2000s, the whole function of digging into the data for insights – “analytics” – expanded in importance to form a separate area that soon came to dominate the “reporting” side of BI. 

So now let’s review the terminology before HTAP.  “Transaction” still meant “an operation on a database,” whether its aim was to record a business transaction, report on business financials, or dig into the data for insights – even though the latter two had little to do with business transactions.  “Analytical”, likewise, referred not to monthly reports but to data-architect data mining – even though those who read quarterly reports were effectively doing an analytical process.  In other words, the old words had pretty much ceased to describe what data processing is really doing these days.

But where the old terminology really falls down is in talking about sensor-driven data processing, such as in the Internet of Things.  There, large quantities of data must be ingested via updates in “almost real time”, and this is a very separate function from the “quick analytics” that must then be performed to figure out what to do about the car in the next lane that is veering toward one, as well as the deeper, less hurried analytics that allows the IoT to do better next time or adapt to changes in traffic patterns.

In HTAP, transactional means “update-heavy”, in the sense of both a business transaction and a sensor feed.  Analytical means not only “read-heavy” but also gaining insight into the data quickly as well as over the long term.  Analytical and transactional, in their new meanings, correspond to both the way data processing is operating right now and the way it will need to operate as Fast Data continues to gain tasks in connection to the IoT.

But there is also the word “hybrid” – and here is a valuable way of thinking about moving IT data processing forward to meet the needs of Fast Data and the IoT.  Present transactional systems operating as a “periodic dump” to a conceptually very separate data warehouse simply are too disconnected from analytical ones.  To deliver rapid analytics for rapid response, users also need “edge analytics” done by a database engine that coordinates with the “edge” transactional system.  Transactional and analytical systems cannot operate in lockstep as part of one engine, because we cannot wait as each technological advance in the transactional side waits for a new revision of the analytical side, or vice versa.  HTAP tells us that we are aiming for a hybrid system, because only that has the flexibility and functionality to handle both Big Data and Fast Data.

The Bottom Line

I would suggest that IT shops looking to take next steps in IoT or Fast Data try adopting the HTAP mindset.  This would involve asking oneself:

·         To what degree does my IT support both transactional and analytical processing by the new definition, and how clearly separable are they?

·         Does my system for IoT involve separate analytics and operational functions, or loosely-coupled ones (rarely today does it involve “one database fits all”)?

·         How well does my IT presently support “rapid analytics” to complement my sensor-driven analytical system?

If your answer to all three questions puts you in sync with HTAP, congratulations:  you are ahead of the curve.  If, as I expect, in most cases the answers reveal areas for improvement, those improvements should be at a part of IoT efforts, rather than trying to patch the old system a little to meet today’s IoT need.  Think HTAP, and recognize the road ahead.

No comments: