Earlier this year, participants at the second In-Memory
Summit frequently referred to a new marketing term for data processing in the
new architectures: HTAP, or Hybrid
Transactional-Analytical Processing.
That is, “transactional” (typically update-heavy) and “analytical”
(typically read-heavy) handling of user requests are thought of as loosely coupled,
with each database engine somewhat optimized for cross-node, networked
operations.
Now, in the past I have been extremely skeptical of such
marketing-driven “new acronym coinage,” as it has typically had
underappreciated negative consequences.
There was, for example, the change from “database management system” to “database”,
which has caused unending confusion about when one is referring to the system
that manages and gives access to the data, and when one is referring to the store
of data being accessed. Likewise, the PC
notion of “desktop” has meant that most end users assume that information
stored on a PC is just a bunch of files scattered across the top of a desk –
even “file cabinet” would be better at getting end users to organize their
personal data. So what do I think about
this latest distortion of the previous meaning of “transactional” and “analytical”?
Actually, I’m for it.
Using an Acronym to Drive Database Technology
I like the term for two reasons:
1.
It frees us from confusing and outdated terminology,
and
2.
It points us in the direction that database
technology should be heading in the near future.
Let’s take the term “transactional”. Originally, most database operations were
heavy on the updates and corresponded to a business transaction that changed
the “state” of the business: a product
sale, for example, reflected in the general ledger of business accounting.
However, in the early 1990s, pioneers such as Red Brick Warehouse realized that
there was a place for databases that specialized in “read” operations, and that
functional area corresponded to “rolling up” and publishing financials, or “reporting”. In the late 1990s, analyzing that reporting
data and detecting problems were added to the functions of this separate “read-only”
area, resulting in Business Intelligence, or BI (similar to military
intelligence) suites with a read-only database at the bottom. Finally, in the early 2000s, the whole
function of digging into the data for insights – “analytics” – expanded in
importance to form a separate area that soon came to dominate the “reporting”
side of BI.
So now let’s review the terminology before HTAP. “Transaction” still meant “an operation on a
database,” whether its aim was to record a business transaction, report on
business financials, or dig into the data for insights – even though the latter
two had little to do with business transactions. “Analytical”, likewise, referred not to
monthly reports but to data-architect data mining – even though those who read
quarterly reports were effectively doing an analytical process. In other words, the old words had pretty much
ceased to describe what data processing is really doing these days.
But where the old terminology really falls down is in
talking about sensor-driven data processing, such as in the Internet of
Things. There, large quantities of data
must be ingested via updates in “almost real time”, and this is a very separate
function from the “quick analytics” that must then be performed to figure out
what to do about the car in the next lane that is veering toward one, as well
as the deeper, less hurried analytics that allows the IoT to do better next
time or adapt to changes in traffic patterns.
In HTAP, transactional means “update-heavy”, in the sense of
both a business transaction and a sensor feed.
Analytical means not only “read-heavy” but also gaining insight into the
data quickly as well as over the long term.
Analytical and transactional, in their new meanings, correspond to both
the way data processing is operating right now and the way it will need to
operate as Fast Data continues to gain tasks in connection to the IoT.
But there is also the word “hybrid” – and here is a valuable
way of thinking about moving IT data processing forward to meet the needs of
Fast Data and the IoT. Present transactional
systems operating as a “periodic dump” to a conceptually very separate data
warehouse simply are too disconnected from analytical ones. To deliver rapid analytics for rapid
response, users also need “edge analytics” done by a database engine that
coordinates with the “edge” transactional system. Transactional and analytical systems cannot
operate in lockstep as part of one engine, because we cannot wait as each
technological advance in the transactional side waits for a new revision of the
analytical side, or vice versa. HTAP
tells us that we are aiming for a hybrid system, because only that has the flexibility
and functionality to handle both Big Data and Fast Data.
The Bottom Line
I would suggest that IT shops looking to take next steps in
IoT or Fast Data try adopting the HTAP mindset.
This would involve asking oneself:
·
To what degree does my IT support both
transactional and analytical processing by the new definition, and how clearly
separable are they?
·
Does my system for IoT involve separate analytics
and operational functions, or loosely-coupled ones (rarely today does it
involve “one database fits all”)?
·
How well does my IT presently support “rapid
analytics” to complement my sensor-driven analytical system?
If your answer to all three questions puts you in sync with
HTAP, congratulations: you are ahead of
the curve. If, as I expect, in most
cases the answers reveal areas for improvement, those improvements should be at
a part of IoT efforts, rather than trying to patch the old system a little to
meet today’s IoT need. Think HTAP, and
recognize the road ahead.
No comments:
Post a Comment