Open Time Store™
Copyright © JUXT LTD 2018-2019

Introduction

Crux is a document database that provides you with a comprehensive means of traversing and querying across all of your documents and data without any need to define a schema ahead of time. This is possible because Crux is "schemaless" and automatically indexes the top-level fields in all of your documents to support efficient ad-hoc joins and retrievals. With these capabilities you can quickly build queries that match directly against the relations in your data without worrying too much about the shape of your documents or how that shape might change in future.

Crux is also a graph database. The central characteristic of a graph database is that it can support arbitrary-depth graph queries (recursive traversals) very efficiently by default, without any need for schema-level optimisations. Crux gives you the ability to construct graph queries via a Datalog query language and uses graph-friendly indexes to provide a powerful set of querying capabilties. Additionally, when Crux’s indexes are deployed directly alongside your application you are able to easily blend Datalog and code together to construct highly complex graph algorithms.

This page walks through many of the more interesting queries that run as part of Crux’s default test suite. See test/crux/query_test.clj for the full suite of query tests and how each test listed below runs in the wider context.

Extensible Data Notation (edn) is used as the data format for the public Crux APIs. To gain an understanding of edn see resources.html.

Note that all Crux Datalog queries run using a point-in-time view of the database which means the query capabilities and patterns presented in this section are not aware of valid times or transaction times.

Basic Query

In the most basic case, a Datalog query works by searching for "subgraphs" in the database that match the pattern defined by the clauses. The values within these subgraphs are then returned according to the list of return variables requested in the :find vector within the query.

Our first query runs on a database that contains the following 3 documents which get broken apart and indexed as "entities":

[{:crux.db/id :ivan
  :name "Ivan"
  :last-name "Ivanov"}

 {:crux.db/id :petr
  :name "Petr"
  :last-name "Petrov"}

 {:crux.db/id :smith
  :name "Smith"
  :last-name "Smith"}]

Note that :ivan, :petr and :smith are edn keywords, which may be used as document IDs in addition to UUIDs.

The following query has 3 clauses, represented as edn vectors within the :where vector. These clauses constrain the result set to match only the entity (or subgraph of interconnected entities) that satisfy all 3 clauses at once:

{:find [p1]
 :where [[p1 :name name]
         [p1 :last-name name]
         [p1 :name "Smith"]]}

Let’s try to work out what these 3 clauses do…​

[p1 :name name] is looking for all entities that have a value under the attribute of :name and then binds the corresponding entity ID to p1 and the corresponding value to name. Since all 3 entities in our database have a :name attribute, this clause alone will simply return all 3 entities.

[p1 :last-name name] reuses the variable name from the previous clause which is significant because it constrains the query to only look for entities where the value of :name (from the first clause) is equal to the value of :last-name (from the second clause). Looking at documents that were processed by our database there is only one possible entity that can be returned, because it has the same values :name and :last-name.

[p1 :name "Smith"] only serves to reinforce the conclusion from the previous two clauses which is that the variable name can only be matched against the string "Smith" within our database.

…​so what is the actual result of the query? Well that is defined by the :find vector which states that only the values corresponding to p1 should be returned, which in this case is simply :smith (the keyword database ID for the document relating to our protagonist "Smith Smith"). Results are returned as an edn set, which means duplicate results will not appear.

The edn result set only contains the value :smith

#{[:smith]}

Arguments

For the next set of queries we will again use the same set of documents for our database as used in the previous section:

[{:crux.db/id :ivan
  :name "Ivan"
  :last-name "Ivanov"}

 {:crux.db/id :petr
  :name "Petr"
  :last-name "Petrov"}

 {:crux.db/id :smith
  :name "Smith"
  :last-name "Smith"}]

Query: "Match on entity ID and value"

{:find [name]
 :where [[e :name name]]
 :args [{:e :ivan
         :name "Ivan"}]}

Our first query supplies two arguments to the query via a map within the :args vector. The effect of this is to make sure that regardless of whether other :name values in the database also equal "Ivan", that only the entity with an ID matching our specific ivan ID is considered within the query. Use of arguments means we can avoid hard-coding values directly into the query clauses.

Result Set:

#{["Ivan"]}

Query: "Match entities with given values"

{:find [e]
  :where [[e :name name]]
  :args [{:name "Ivan"}
         {:name "Petr"}]}

This next query shows how multiple argument values can be mapped to a single field. This allows us to usefully parametise the input to a query such that we do not have to rerun a single query multiple times (which would be significantly less efficient!).

Result Set:

#{[:ivan] [:petr]}

Query: "Match entities with given value tuples"

{:find [e]
 :where [[e :name name]
         [e :last-name last-name]]
 :args [{:name "Ivan" :last-name "Ivanov"}
        {:name "Petr" :last-name "Petrov"}]}

Here we see how we can extend the parametisation to match using multiple fields at once.

Result Set:

#{[:ivan] [:petr]}

Query: "Use predicates with arguments"

{:find [name]
 :where [[(re-find #"I" name)]
         [(= last-name "Ivanov")]]
 :args [{:name "Ivan" :last-name "Ivanov"}
        {:name "Petr" :last-name "Petrov"}]}

Something else we can do with arguments is apply predicates to them directly within the clauses. Predicates return either true or false but all predicates used in clauses must return true in order for the given combination of field values to be part of the valid result set. In this case only :name "Ivan" satisfies [(re-find #"I" name)] (which returns true for any values that begin with "I").

#{["Ivan"]}

Query: "Use range constraints with arguments"

{:find [age]
 :where [[(>= age 21)]]
 :args [{:age 22}]}

Finally we can see how we can return an argument that passes all of the predicates by including it in the :find vector. This essentially bypasses any interaction with the data in our database.

Result Set:

#{[22]}

Valid time travel

Congratulations! You already know enough about queries to build a simple CRUD application with Crux. However, your manager has just told you that the new CRUD application you have been designing needs to backfill the historical document versions from the legacy CRUD application. Luckily Crux makes it easy for your application to both insert and retrieve these old versions.

Here we will see how you are able to run queries at a given point in the valid time axis against, implicitly, the most recent transaction time.

First, we transact a very old document into the database with the ID :malcolm and the :name "Malcolm", and specify the valid time instant at which this document became valid in the legacy system: #inst "1986-10-22".

[{:crux.db/id :malcolm :name "Malcolm" :last-name "Sparks"}]
#inst "1986-10-22"

Next we transact a slightly more recent (though still very old!) revision of that same document where the :name has been corrected to "Malcolma", again using a historical timestamp extracted from the legacy system.

[{:crux.db/id :malcolm :name "Malcolma" :last-name "Sparks"}]
#inst "1986-10-24"

We are then able to query at different points in the valid time axis to check for the validity of the correction. We define a query q:

{:find [e]
 :where [[e :name "Malcolma"]
         [e :last-name "Sparks"]]}

Firstly we can verify that "Malcolma" was unknown at #inst "1986-10-23".

; Using Clojure: `(api/q (api/db my-crux-system #inst "1986-10-23") q)`

Result Set:

#{}

We can then verify that "Malcolma" is the currently known :name for the entity with ID :malcolm by simply not specifying a valid time alongside the query. This will be the case so long as there are no newer versions (in the valid time axis) of the document that affect the current valid time version.

; Using Clojure: `(api/q (api/db my-crux-system) q)`

Result Set:

#{[:malcolm]}

History API

Full Document History

Crux allows you to retrieve all versions of a document:

(api/submit-tx
  system
  [[:crux.tx/put :ids.persons/Jeff
    {:crux.db/id :ids.persons/Jeff
     :person/name "Jeff"
     :person/wealth 100}
    #inst "2018-05-18T09:20:27.966"]
   [:crux.tx/put :ids.persons/Jeff
    {:crux.db/id :ids.persons/Jeff
     :person/name "Jeff"
     :person/wealth 1000}
    #inst "2015-05-18T09:20:27.966"]])

;yields
{:crux.tx/tx-id 1555314836178,
 :crux.tx/tx-time #inst "2019-04-15T07:53:56.178-00:00"}


(api/history system :ids.persons/Jeff)

; yields
[{:crux.db/id ; sha1 hash of document id
  "c7e66f757f198e08a07a8ea6dfc84bc3ab1c6613",
  :crux.db/content-hash ; sha1 hash of document contents
  "6ca48d3bf05a16cd8d30e6b466f76d5cc281b561",
  :crux.db/valid-time #inst "2018-05-18T09:20:27.966-00:00",
  :crux.tx/tx-time #inst "2019-04-15T07:53:55.817-00:00",
  :crux.tx/tx-id 1555314835817}
 {:crux.db/id "c7e66f757f198e08a07a8ea6dfc84bc3ab1c6613",
  :crux.db/content-hash "a95f149636e0a10a78452298e2135791c0203529",
  :crux.db/valid-time #inst "2015-05-18T09:20:27.966-00:00",
  :crux.tx/tx-time #inst "2019-04-15T07:53:56.178-00:00",
  :crux.tx/tx-id 1555314836178}]

Document History Range

Retrievable document versions can be bounded by four time coordinates:

  • valid-time-start

  • tx-time-start

  • valid-time-end

  • tx-time-end

All coordinates are inclusive. All coordinates can be null.

(crux.api/history-range system :ids.persons/Jeff
  #inst "2015-05-18T09:20:27.966"  ; valid-time start or nil
  #inst "2015-05-18T09:20:27.966"  ; transaction-time start or nil
  #inst "2020-05-18T09:20:27.966"  ; valid-time end or nil, inclusive
  #inst "2020-05-18T09:20:27.966") ; transaction-time end or nil, inclusive.

; yields
({:crux.db/id ; sha1 hash of document id
  "c7e66f757f198e08a07a8ea6dfc84bc3ab1c6613",
  :crux.db/content-hash  ; sha1 hash of document contents
  "a95f149636e0a10a78452298e2135791c0203529",
  :crux.db/valid-time #inst "2015-05-18T09:20:27.966-00:00",
  :crux.tx/tx-time #inst "2019-04-15T07:53:56.178-00:00",
  :crux.tx/tx-id 1555314836178}
  {:crux.db/id "c7e66f757f198e08a07a8ea6dfc84bc3ab1c6613",
   :crux.db/content-hash "6ca48d3bf05a16cd8d30e6b466f76d5cc281b561",
   :crux.db/valid-time #inst "2018-05-18T09:20:27.966-00:00",
   :crux.tx/tx-time #inst "2019-04-15T07:53:55.817-00:00",
   :crux.tx/tx-id 1555314835817})


(crux.api/entity (crux.api/db system) "c7e66f757f198e08a07a8ea6dfc84bc3ab1c6613")

; yields
{:crux.db/id :ids.persons/Jeff,
 :d.person/name "Jeff",
 :d.person/wealth 100}

Joins

Query: "Join across entities on a single attribute"

Given the following documents in the database

[{:crux.db/id :ivan :name "Ivan"}
 {:crux.db/id :petr :name "Petr"}
 {:crux.db/id :sergei :name "Sergei"}
 {:crux.db/id :denis-a :name "Denis"}
 {:crux.db/id :denis-b :name "Denis"}]

We can run a query to return a set of tuples that satisfy the join on the attribute :name

{:find [p1 p2]
 :where [[p1 :name name]
         [p2 :name name]]}

Result Set:

#{[:ivan :ivan]
  [:petr :petr]
  [:sergei :sergei]
  [:denis-a :denis-a]
  [:denis-b :denis-b]
  [:denis-a :denis-b]
  [:denis-b :denis-a]}

Note that every person joins once, plus 2 more matches.

Query: "Join with two attributes, including a multi-valued attribute"

Given the following documents in the database

[{:crux.db/id :ivan :name "Ivan" :last-name "Ivanov"}
 {:crux.db/id :petr :name "Petr" :follows #{"Ivanov"}}]

We can run a query to return a set of entities that :follows the set of entities with the :name value of "Ivan"

{:find [e2]
 :where [[e :last-name last-name]
         [e2 :follows last-name]
         [e :name "Ivan"]]}

Result Set:

#{[:petr]}

Note that because Crux is schemaless there is no need to have elsewhere declared that the :follows attribute may take a value of edn type set.

Rules

This example of a rule demonstrates a recursive traversal of entities that are connected to a given entity via the :follow attribute.

{:find [?e2]
 :where [(follow ?e1 ?e2)]
 :args [{:?e1 :1}]
 :rules [[(follow ?e1 ?e2)
          [?e1 :follow ?e2]]
         [(follow ?e1 ?e2)
          [?e1 :follow ?t]
          (follow ?t ?e2)]]})

Lazy Queries

The function crux.api/q takes 2 or 3 arguments, db and q but also optionally a snapshot which is already opened and managed by the caller (using with-open for example). This version of the call returns a lazy sequence of the results, while the other version provides a set. A snapshot can be retreived from a kv instance via crux.api/new-snapshot.

DataScript Differences

This list is not necessarily exhaustive and is based on the partial re-usage of DataScript’s query test suite within Crux’s query tests.

Crux does not support:

  • vars in the attribute position, such as [e ?a "Ivan"] or [e _ "Ivan"]

Crux does not yet support:

  • ground, get-else, get-some, missing?, missing? back-ref

  • destructuring

  • source vars, e.g. function references passed into the query via :args

Note that many of these not yet supported query features can be achieved via simple function calls since you can currently fully qualify any function that is loaded. In future, limitations on available functions may be introduced to enforce security restrictions for remote query execution.

Test queries from DataScript such as "Rule with branches" and "Mutually recursive rules" work correctly with Crux and demonstrate advanced query patterns. See the Crux tests for details.