Crux is now XTDB

September 2021 Update: Crux has recently been renamed XTDB. This post still refers to XTDB as “Crux” but you will find newer versions of the artifacts discussed under the new com.xtdb Maven GroupId ([com.xtdb/xtdb-core "1.19.0"] for example) instead of the pro.juxt GroupId.

The official home for XTDB is now https://xtdb.com.

Most of us might have coded up the humble database column updated_date, at some point or another, giving us a feeling that data and time are nearly always intrinsically linked in our database designs. Along with updated_date, we may also find ourselves needing created_date, to preserve an immutable record of the row’s origin.

Sometimes we will need to keep the historical versions of entities around also, either for audit reasons or to serve business requirements where historical data can inform future decisions.

In table above we maintain versions of entities by using valid_from and valid_to. This allows us to navigate to a particular version of an entity that was live at a given time.

This approach is pragmatic and has benefits, but it adds complexity to the database design. It also relies on updating database rows to set the date columns, which hampers future efforts to work with the database as an immutable store.

Temporal Databases

Temporal databases aim to make our programming lives easier around time, by baking time itself into the engine. One major feature of temporal databases is the ability to query the database as of a particular point in time.

The SQL:2011 specification gives syntax for this:

SELECT... AS OF SYSTEM TIME

To facilitate this, the database itself maintains a transaction time (aka system time) for each transaction, and it is always increasing so that the past is immutable. This allows you to query the database at any point on the transaction time axis, to see what the database looked like in a given point in history.

The transaction time of a database fact is the time when the fact is current in the database and may be retrieved

— Glossary of Temporal Database Concepts

transaction time is the time at which the database sees incoming data. This gives an inherent audit log, maintaining an immutable history of database transactions.

Temporal databases may also offer the ability to get the history of entities, or to query using temporal predicates (i.e. find me all cases when a field was changed in a certain time span). They may also offer support for time-series investigations to help analyze out trends and to make predictions.

An example use-case of where as of may be useful is in financial services, where regulatory regimes demand that financial trades can be reconstructed to show how they were priced, based on the available market data at the time.

To develop this kind of trade-reconstruction functionality without temporal functionality such as as of, is surprisingly difficult to do. To get around this, data is often copied and put aside, with an increased operations management overhead.

If our databases have temporal functionality then we can simply query the database as of the trade creation time, to discover why the trade was priced the way it was.

Whose time is it anyway?

If your database is the only database in the mix and is therefore the singular source of truth, then transaction time could be all that matters.

What happens however, if your database is fed from some upstream source, which is a common occurrence in large enterprises?

In systems with substantial data, the database being queried against is often a materialised view formed off the back of an upstream event-log, and it’s the time when the data entered the event-log that we care most about. It’s this time axis that we may want to query against using techniques such as as of, rather than transaction time.

This additional timestamp field - originating upstream - could represent the real truth of when a fact was made a reality, not just when it was transacted.

Bitemporality is about the introduction of a second time axis, called valid time.

The valid time of a fact is the time when the fact is true in the modelled reality

— Glossary of Temporal Database Concepts

Valid Time

With bitemporality, the powerful things you can do in a temporal database against a single time axis of transaction time, you can do against two axes of time: transaction time and valid time.

In the case of a database being fed from some upstream source, the immediate database would still own and govern transaction time, but valid time would be passed to it in each transaction.

This would allow programmers to perform temporal queries such as as of against valid time, which is a more typical business use-case.

Immutability of the database will still be preserved, because transaction time is always increasing and you can’t update or erase the transaction time past.

When performing temporal queries, programmers will be able to query as of a bitemporal co-ordinate, i.e. to be shown the state of the world at a point in valid time, but also at a point in transaction time past. This allows for query consistency, factoring in both time axes.

Updating the Past

valid time has a fundamental difference to transaction time, in that you can insert into the past.

In the messy world of enterprise IT, data is frequently moved about wholesale, be it data-centre migrations to the cloud, compacting, backups, ETL jobs to provide materialised views… data is nearly always on the move.

There may be times when we want to do bulk-loads into the past, and if we’re reliant purely on a single monotonically increasing time-axis, then this is going to be a challenge, especially if we want to preserve transaction time as something meaningful to query against. Add in the potential need for parallel data-writing to speed up ingestion, then the problem is exacerbated.

valid time is more flexible. You can insert into the past of valid time, and therefore it doesn’t matter in which order transactions are committed.

This also helps for global data topologies; if your database is being fed transactions from upstream sources, it might be inescapable that transactions will occasionally arrive out of order.

valid time also opens up the possibility of sharding for horizontal scaling, whereby different data-stores might have differing transaction times for when facts were transacted, but then you could achieve query consistency using valid time.

N-Temporality?

Why would you stop at two time axes? Why not go for three, or four, or N many?

Firstly, bitemporality is a significant increment as it allows for correcting against the past. Now you can have your immutable data cake and eat it.

Secondly, it’s helpful to see both transaction time and valid time as the essential building blocks for creating temporal data-models.

With transaction time you achieve immutability and you have the rock solid audit log. Should you then need an additional domain time, you have options for modeling your data using independent facts with their own valid time, rather than overloading multiple time fields onto a single fact or entity.

It isn’t trivial, but with two dimensions you can model your data to reflect views of time for different business use-cases.

Conclusion

Bitemporality gives programmers more control over time. It’s useful for query, but it’s also important for the messy world that we live in, where data is moved around and we can’t guarantee strict transactional ordering.

Having two time dimensions allows us to take more advantage of the features a temporal database offers.

JUXT is releasing a new open-source bitemporal database called Crux, on April 19, 2019 at Clojure/north.

The Value of Bitemporality

Whose Time Is It Anyway?

Temporal Databases

Whose time is it anyway?

Valid Time

Updating the Past

N-Temporality?

Conclusion

References

Embracing Imperfection

Jon Pither's past mistakes, and how to cope with real data in an imperfect world

Autonomy, Mastery, Purpose

ThoughtWorks Meetup: Time Travel Databases

What if your database never forgets?

The Value of Bitemporality

Whose Time Is It Anyway?

Temporal Databases

Whose time is it anyway?

Valid Time

Updating the Past

N-Temporality?

Conclusion

References

Embracing Imperfection

Jon Pither's past mistakes, and how to cope with real data in an imperfect world

Autonomy, Mastery, Purpose

ThoughtWorks Meetup: Time Travel Databases

What if your database never forgets?

Sign up for our newsletter