Crux is now XTDB September 2021 Update: Crux has recently been renamed XTDB. The official home for XTDB is now https://xtdb.com.
We have just released Crux - an open-source bitemporal database.
What is Crux?
Crux is a document store that indexes documents for graph query. The
indexes are bitemporal, meaning that you can query against valid time
and transaction time
, the usefulness of which is covered in our
previous post “the value of
bitemporality”.
Unbundled
Crux is an unbundled database - to use Martin Kleppman’s phrase - shipping as a connected set of pluggable parts. This means that users can swap out parts and contribute their own, and that Crux itself follows the Unix philosophy of each part doing one thing particularly well.
This pluggability allows Crux to scale with you as your scaling needs increase. You can start out using Crux with the transaction log being a local-disk based implementation, and then in future you could switch it out to Kafka, which offers much higher data throughput and retention guarantees.
With the open, unbundled architecture, it’s intended that Crux be extended and experimented with. The various parts in Crux are described by Clojure protocols, meaning that users can get in and provide their own implementations that would either fully replace or decorate the existing ones.
How Crux Works
Crux is schemaless, with transactions being submitted through the Crux API. The data is then sent to two event-log topics for storage: the transaction topic and the document topic.
We use two topics because whilst the transaction topic is immutable, messages in the document topic can be permanently erased, forming the basis of Crux’s ground-up strategy to provide ease of content eviction for data privacy reasons, to align with compliance regimes such as GDPR.
Using a separate topic for the content documents also allows for compaction to remove duplicates, as the message ID is a content hash of the document. From a Kafka perspective, the transaction topic uses a single Kafka partition, but it is in our roadmap to shard the document topic to potentially use multiple partitions.
The event-log that Crux uses is the golden store of data, with Crux leveraging Kafka’s infinite retention capability.
Crux Nodes will then ingest the data from the event-log and index the transactions and documents locally into a local Key/Value store such as RocksDB or LMDB, which acts as the foundation for both a local document store and a set of bitemporal indexes that Crux maintains for graph query. RocksDB and LMDB use fundamentally different data structures and therefore present a choice of performance characteristics and trade-offs.
Crux currently supports both a Java and Clojure API. See the JavaDocs.
Transacting and Querying
Crux supports an Edn Datalog format, similar to - though not the same as - Datomic’s. To get a feel of transacting to and querying against Crux, check out the query documentation and/or read Ivan Fedorov’s “a bitemporal tale”.
Crux supports four transaction operations:
-
PUT
-
DELETE
-
CAS
-
EVICT
PUT
will store a document whereas DELETE
will delete it from a given
valid time
, but the data will still be stored in Crux history. Use
EVICT
to get rid of data permanently, either for all of history, or
for a given valid time
window. Use CAS
to compare-and-swap, to
ensure that the data in a document/entity is what you think it is before
adding a new version, or else abort the transaction.
Inside of Crux we use a Worse Case Optimal Join algorithm, which enables the query engine to lazily stream out results for an arbitrary complex query with multiple join conditions and clauses. This, in combination with an external merge sort used for additional sorting, means that we avoid manifesting intermediary results in memory.
Deployment
Crux can be deployed as a JAR file within your application, or Crux has a HTTP server that you can use. You can use Crux in a standalone mode without Kafka (substituting in a local disk-based event-log), or you can deploy a cluster of Crux nodes that use Kafka.
Open
Crux is open source so that you can see the code, commit history, warts and all. You can see the GitHub issues where design decisions are made, and you can contribute in this process. You can fork Crux and send PRs our way. We encourage developers to try out Crux and to expose and publish patterns of using it, to feedback their ideas and critique.
Crux is a product that JUXT will offer various support models for, including enterprise support and managed hosting. If you have any questions about Crux or would like to talk to us about using it, please email us or visit our Zulip.
Have a play with XTDB - add the JAR to your project and scale up from there. Crux is Alpha. Please raise any issues on our GitHub.