Monday, May 14, 2012

OLTP vs. Analytics


This blog is all into database scaling, but before we dive into scaling challenges, here's a quick alignment around those important terms, if you want, an introduction to OLTP and Analytics...

Traditionally the RDBMS world was divided into 2 main categories:
OLTP, Online Transaction Processing (http://en.wikipedia.org/wiki/OLTP), data is stored and processed by operational applications throughout the organization. In an example of a supermarket, the cashiers application is clearly an operational application doing OLTP. Traditionally OLTP databases were characterized by:

  1. Large number of concurrent users
  2. High throughput of concurrent transactional activity of reads and writes mixture
  3. Smaller amounts of data


On the other side we could find the Data Warehouse (http://en.wikipedia.org/wiki/Data_warehouse) or DSS (Decision Support Systems, http://en.wikipedia.org/wiki/Decision_Support_Systems, Oracle likes DSS very much...) or with their newer name: Analytics (http://en.wikipedia.org/wiki/Analytics).
In our supermarket, the Data Warehouse was loaded from data from all operational system, the cahiers system and also from the logistics and suppliers system , ERP, CRM – and Business Intelligence tools are used to allow convenient visual ways to query and analyze the enormous amounts of data making aggregations, summaries, comparisons and so on. Traditionally Analytics databases were characterized by:

  1. Small number of concurrent users
  2. Non transactional, read-only activity
  3. Huge amounts of data


While RDBMS can be used for both OLTP and Analytics, each has different design methodologies, tuning parameters, hardware requirements. Every DBA will tell you that Oracle’s BITMAP INDEX speeds up Analytics since the schema is in a star-schema design, however it can be hell for OLTP when schema design is totally different, and concurrent write transactions will suffer from the BITMAP’s segment-level-locking. Columnar database
On the other hand, in OLTP the DBA will set a goal and eventually reach a cache hit-ratio of 95%, and “sql-area” hit-ratio of 99.99%. In Analytics the above numbers are impossible to reach just because of the different activity imposed on the database, resulting in different behaviors, and it’s OK. There is a difference.

Scale challenges and practices are also different. In future posts I'll address specific database scale issues relevant to OLTP from one hand and Analytics on the other.

Stay tuned.

1 comment:

  1. With modern levels of data storage, analytics are only possible with data center infrastructure management software and solutions. Then you can handle both large amounts of data and many concurrent users. Thanks for the post!

    ReplyDelete