So you want Slowly Changing Dimension? | DoltHub Blog
Blog
PRODUCTS<br>Dolt<br>Doltgres<br>DoltLite<br>Dumbo<br>Hosted Dolt<br>DoltHub<br>DoltLab<br>Dolt Workbench
KEYWORDS<br>AI<br>Use Case<br>Reference<br>Feature Release<br>Golang<br>Integration<br>Performance<br>Technical<br>Web<br>Dataset
Slowly changing dimension is a framework used to discuss database versioning. It can be used to simulate database version control.
In the past, database version control needed to be simulated using slowly changing dimension because the technology didn’t exist to actually version control a database. We here at DoltHub solved that problem with Dolt, the world’s first and only version controlled SQL database. Dolt is a database with slowly changing dimension type 3 built in.
In this article, we will explain what slowly changing dimension is including examples. Then, we’ll talk about what it’s used for. Finally, we’ll talk a little bit about Dolt, the SQL database with slowly changing dimension built in.
What is Slowly Changing Dimension?#
Slowly changing dimension was originally designed as a way to add data versioning to an Extract, Transform, Load (ETL) process. If you were extracting data from your Online Transaction Protocol (OLTP) database and loading it into your Offline Analytics Processing (OLAP) database (ie. data warehouse), how should the history of data changes be handled for particular columns? Slowly changing dimension was developed as a framework to answer this question.
Imagine a business database that contains customer information: their names and addresses. Most of the time this data won’t change, but it sometimes will. Maybe a customer moves, or gets married. These columns can be versioned using slowly changing dimension.
You could try a few approaches for dealing with this, each with their own tradeoffs. For our customer table, suppose you had an address change. You could:
overwrite the old address, and lose it forever
create a new address column in the table, but make the table wider
add a new row for the new address, but make the database larger with more complicated read queries
add a history table, which tracks all address changes, but also makes the database larger
These approaches are called Type 1 through Type 4 slowly changing dimension, respectively.
Ralph Kimball and Margy Ross are credited with developing the Slowly Changing dimension framework in 2005. They published a book called The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling in 2013.
Over time, slowly changing dimension techniques were used for simulating database version control in Online Transaction Protocol (OLTP) databases. My teams have implemented slowly changing dimension techniques to simulate version control in an OLTP database. This database version control is exposed by the application to allow users to see history, make multiple changes at once (ie. simulated branches), or soft delete.
So, I prefer a more liberal definition of slowly changing dimension: slowly changing dimension defines a framework for history and version handling in databases.
Why use Slowly Changing Dimension?#
Now that you understand what slowly changing dimension is, the question is why would you ever need slowly changing dimension? I’ll focus on four key reasons to use slowly changing dimension:
Historical Reporting
Expose Versioning to Your Application
Disaster Recovery
Reproducibility
Historical Reporting#
Most slowly changing dimension has an ETL focus. You have a fast changing OLTP database and you need history in your OLAP data warehouse to build reports. You use the Slowly Changing Dimension system to define the relationship between your unversioned production database and your versioned data warehouse. You codify this relationship in ETL jobs.
Expose Versioning to your Application#
Maybe your application customers are asking for history reporting in the application. So, you add slowly changing dimension to your production database and run OLAP-style historical reporting queries on your production OLTP database. These queries are generally more expensive than OLTP queries so a common pattern is to run them on a replica.
Another slowly changing dimension use is to add other versioning features beyond logs to your application like diffs, branches, and even merges. Branches can be simulated by having multiple active versions with branch names. We’ll cover this a little more in the Poor Man’s Data Version Control section.
Disaster Recovery#
Like soft deletes or change data capture, slowly changing dimension can be used to recover from a certain class of bad query. Without any history, you are forced to go to backups and logs to save yourself from an erroneous delete. Slowly changing dimension provides some history in the database to use instead.
Reproducibility#
Slowly changing dimension can be used for model reproducibility. Increasingly, data in databases in being used in machine learning applications. You may want to be able to build a model with...