Bi-Temporal Data Modeling: An Overview
We often end up in situations where we want to store data with certain validity and retrieve the same to see how it looked at a certain point in the past. The aspect of storing valid data with time comes with some overheads and has to be addressed as a part Schema Modeling.
Bitemporal Modeling is a specific case of Temporal database information modeling technique designed to handle historical data along two different timelines. This makes it possible to rewind the information to “as it actually was” in combination with “as it was recorded” at some point in time. In order to be able to do so, information cannot be discarded even if it is erroneous.
What is bitemporal data?
Whenever a database stores information about time, we can speak of it as a “temporal database”. A temporal database stores data relating to time instances.
More specifically the temporal aspects usually include valid time or transaction time
- Valid time is the time period during which a data is true in the real world.
- Transaction time is the time at which a data was recorded in the database.
A database which makes use of one of these aspect is called a uni-temporal database.It enables us to query this data, so we can either find out what data was stored at a certain point in the past (transaction time) or which data was valid (or will be valid) for which timespan (valid time).
A database which considers both aspects of time simultaneously, is called a bi-temporal database. It enables us to query data in regard to both transaction time and valid time.
Modeling Temporal Data
Consider a simple scenario as explained by Martin Fowler for Bitemporal Modeling
“Imagine we have a payroll system that knows that an employee has a rate of $100/day starting on Jan. 1. On Feb. 25, we run the payroll with this rate. On March 15, we learn that, effective on Feb. 15, the employee’s rate changed to $211/day. What should we answer when we are asked what the rate was for Feb. 25?”
Most database queries are only concerned with data as it is now, such as “What is the employee billing rate per day?”. In this case the record is overwritten and we have no track of previous record. We are not able to effectively retrospect the data of when this data was changed and when it becomes effective.
It is tricky to cater for queries such as “What was the billing rate of an employee September 2nd of last year?” or “How long was employee billing for $100.00?”. By adding temporality it is possible to represent these queries.
Instead of overwriting the old state, we add a new record showing the new one. With that, we at least have record of the employee’s original pay rate. Unfortunately, this doesn’t explain why payroll sent him the wrong amount of money on Feb. 25
To solve the above disparity we add another dimension of when the entry was made to the record. This dimension is also referred as transaction time.
Adding four date values to every data point is a way of modeling bitemporal data. It is like version control for your data and implementing bitemporality is no small feat.
Operations on Bitemporal Data
Create a rates collection to keep track of employees rate per hour
Insert
The rates collection was created on Jan 1st 2022 and the rate of $100 was entered for employee 1
db.rates.insertOne({
empId: 1,
validStart: ISODate('2022-01-01T00:00:00.000Z'),
validEnd: ISODate('9999-12-31T00:00:00.000Z'),
transactionStart: ISODate('2022-01-01T00:00:00.000Z'),
tnansactionEnd: ISODate('9999-12-31T00:00:00.000Z'),
rate: 100.00
})
db.find({empId: 1})
{
_id: ObjectId("63b2725e40708b857a73d17a"),
empId: 1,
validStart: 2022-01-01T00:00:00.000Z,
validEnd: 9999-12-31T00:00:00.000Z,
transactionStart: 2022-01-01T00:00:00.000Z,
tnansactionEnd: 9999-12-31T00:00:00.000Z,
rate: 100
}
Update
The employee rate was changed to $211 on Mar 15th, effective Feb 15th. Bitemporal update is tough, because we have to write 3 queries.
First, we update the transactionEnd
for the current fact. Since going forward, we no longer believe that the fact will be valid forever.
db.rates.updateOne(
{empId: 1, tnansactionEnd:ISODate('9999-12-31T00:00:00.000Z')},
{$set: {tnansactionEnd: ISODate('2022-03-15T00:00:00.000Z')}}
)
db.find({empId: 1})
{
_id: ObjectId("63b2725e40708b857a73d17a"),
empId: 1,
validStart: 2022-01-01T00:00:00.000Z,
validEnd: 9999-12-31T00:00:00.000Z,
transactionStart: 2022-01-01T00:00:00.000Z,
tnansactionEnd: 2022-03-15T00:00:00.000Z,
rate: 100
}
Second, we need to add a fact saying that $100 was the valid rate from 1st Jan to 15th Feb, and that we believe it from 15th March to forever.
db.rates.insertOne({
empId: 1,
validStart: ISODate('2022-01-01T00:00:00.000Z'),
validEnd: ISODate('2022-02-15T00:00:00.000Z'),
transactionStart: ISODate('2022-03-15T00:00:00.000Z'),
tnansactionEnd: ISODate('9999-12-31T00:00:00.000Z'),
rate: 100.00
})
db.find({empId: 1})
{ _id: ObjectId("63b2725e40708b857a73d17a"),
empId: 1,
validStart: 2022-01-01T00:00:00.000Z,
validEnd: 9999-12-31T00:00:00.000Z,
transactionStart: 2022-01-01T00:00:00.000Z,
tnansactionEnd: 2022-03-15T00:00:00.000Z,
rate: 100 }
{ _id: ObjectId("63b379dc40708b857a73d17b"),
empId: 1,
validStart: 2022-01-01T00:00:00.000Z,
validEnd: 2022-02-15T00:00:00.000Z,
transactionStart: 2022-03-15T00:00:00.000Z,
tnansactionEnd: 9999-12-31T00:00:00.000Z,
rate: 100 }
Finally, we need to add the new fact, that rate is changed to $211
db.rates.insertOne({
empId: 1,
validStart: ISODate('2022-02-15T00:00:00.000Z'),
validEnd: ISODate('9999-12-31T00:00:00.000Z'),
transactionStart: ISODate('2022-03-15T00:00:00.000Z'),
tnansactionEnd: ISODate('9999-12-31T00:00:00.000Z'),
rate: 211.00
})
db.find({empId: 1})
{ _id: ObjectId("63b2725e40708b857a73d17a"),
empId: 1,
validStart: 2022-01-01T00:00:00.000Z,
validEnd: 9999-12-31T00:00:00.000Z,
transactionStart: 2022-01-01T00:00:00.000Z,
tnansactionEnd: 2022-03-15T00:00:00.000Z,
rate: 100 }
{ _id: ObjectId("63b379dc40708b857a73d17b"),
empId: 1,
validStart: 2022-01-01T00:00:00.000Z,
validEnd: 2022-02-15T00:00:00.000Z,
transactionStart: 2022-03-15T00:00:00.000Z,
tnansactionEnd: 9999-12-31T00:00:00.000Z,
rate: 100 }
{ _id: ObjectId("63b37cff40708b857a73d17c"),
empId: 1,
validStart: 2022-02-15T00:00:00.000Z,
validEnd: 9999-12-31T00:00:00.000Z,
transactionStart: 2022-03-15T00:00:00.000Z,
tnansactionEnd: 9999-12-31T00:00:00.000Z,
rate: 211 }
Query
Get the current pay rate for the employee
db.find({
empId: 1,
validStart:{
$lte:ISODate('2022-03-16 ')
},
validEnd:{
$gte:ISODate('2022-03-16')
},
transactionStart:{
$lte:ISODate('2022-03-16')
},
tnansactionEnd:{
$gte:ISODate('2022-03-16')
}
})
Delete
Consider a scenario where an employee leaves the organisation on June 20. We need to delete the employee from the rate collection.
Bitemporal deletion involves two steps. First, we close the transactionEnd
interval, so it won’t show up when we query for all employees that we believe are still with the organisation as of now or at a future date.
db.rates.updateOne(
{
empId: 1,
validStart:{
$lte:ISODate('2022-06-20')
},
validEnd:{
$gte:ISODate('2022-06-20')
},
transactionStart:{
$lte:ISODate('2022-06-20')
},
tnansactionEnd:{
$gte:ISODate('2022-06-20')
}
},
{
$set: {
tnansactionEnd: ISODate('2022-06-20T00:00:00.000Z')
}
})
Second we insert the new record
db.rates.insertOne({
empId: 1,
validStart: ISODate('2022-02-15T00:00:00.000Z'),
validEnd: ISODate('2022-06-20T00:00:00.000Z'),
transactionStart: ISODate('2022-06-20T00:00:00.000Z'),
tnansactionEnd: ISODate('2022-06-20T00:00:00.000Z'),
rate: 211.00
})
What’s Next
There are databases that were designed to support bitemporal data and handle all the complexity that goes with it for us. SQL Server, for example, introduced support in 2016 and MarkLogic, a NoSQL database, introduced bitemporal data support in version 8.0.
I’m a hard-core MongoDB developer and was looking for native bitemporal data support; at this point, we don’t have that support yet. I hope MongoDB will introduce a special collection called the bitemporal collection, like timeseries and capped collections, in the near future.
Implementing bitemporal data support is no small feat. In the next article, we will take a shot at designing bitemporality in MongoDB at the application layer, taking into consideration some of the points mentioned in this article.