Open Time Store™
Copyright © JUXT LTD 2018-2019

Introduction

Crux offers a small REST API that allows you to send transactions and run queries over HTTP. For instance, you could deploy your Crux nodes along with Kafka into a Kubernetes pod running on AWS and interact with Crux from your application purely via HTTP. Using Crux in this manner is a valid use-case but it cannot support all of the features and benfits that running the Crux node inside of your application provides.

Your application only needs to communicate with one Crux node when using the REST API. This Crux node may placed be behind a load balancer which spreads the load over multiple nodes transparently to application. In addition, different Crux nodes might be still be catching up with the head of the transaction log, and since different queries might go to different nodes, you have to be conscious of read consistency issues when designing your application to use Crux in this way. Fortunately, you can mitigate read consistency issues the ability to query consistent point-in-time snapshots of the database by specifying temporal coordinates along with your queries.

The REST API also provides an experimental endpoint for SPARQL 1.1 Protocol queries under /sparql/, rewriting the query into the Crux Datalog dialect. Only a small subset of SPARQL is supported and no other RDF features are available.

Using the HTTP API

The HTTP interface is provided as a Ring middleware in a Clojure namespace, located at crux/src/crux/http_server.clj. There is an example of using this middleware in a full example HTTP server configuration: https://github.com/juxt/crux/tree/master/example/standalone_webservice

Whilst CORS may be easily configured for use while prototyping a Single Page Application that uses Crux directly from a web browser, it is currently NOT recommended to expose Crux directly to any untrusted endpoints (including web browsers) in production since the default query API does not sandbox or otherwise restrict the execution of queries.

Index

Table 1. API
uri method description

/

GET

returns various details about the state of the database

/document/[content-hash]

GET or POST

returns the document for a given hash

/entity

POST

Returns an entity for a given ID and optional valid-time/transaction-time co-ordinates

/entity-tx

POST

Returns the :put or :cas transaction that most recently set a key

/history/[:key]

GET OR POST

Returns the transaction history of a key

/query

POST

Takes a datalog query and returns its results

/query-stream

POST

Same as /query but the results are streamed

/sync

GET

Wait until the Kafka consumer’s lag is back to 0

/tx-log

GET

Returns a list of all transactions

/tx-log

POST

The "write" endpoint, to post transactions.

GET /

Returns various details about the state of the database. Can be used as a health check.

curl -X GET $nodeURL/
{:crux.kv/kv-backend "crux.kv.rocksdb.RocksKv",
 :crux.kv/estimate-num-keys 92,
 :crux.kv/size 72448,
 :crux.zk/zk-active? true,
 :crux.tx-log/consumer-state
   {:crux.kafka.topic-partition/crux-docs-0
      {:offset 25,
       :time #inst "2019-01-08T11:06:41.867-00:00",
       :lag 0},
    :crux.kafka.topic-partition/crux-transaction-log-0
      {:offset 19,
       :time #inst "2019-01-08T11:06:41.869-00:00",
       :lag 0}}}
Note
estimate-num-keys is an (over)estimate of the number of transactions in the log (each of which is a key in RocksDB). RocksDB does not provide an exact key count.

GET/POST /document/[content-hash]

Returns the document stored under that hash, if it exists.

curl -X GET $nodeURL/document/7af0444315845ab3efdfbdfa516e68952c1486f2
{:crux.db/id :foobar, :name "FooBar"}
Note
Hashes for older versions of a document can be obtained with /history, under the :crux.db/content-hash keys.

POST /entity

Takes a key and, optionally, a :valid-time and/or :transact-time (defaulting to now). Returns the value stored under that key at those times.

See Bitemporality for more information.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:eid :tommy}' \
     $nodeURL/entity
{:crux.db/id :tommy, :name "Tommy", :last-name "Petrov"}
curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:eid :tommy :valid-time #inst "1999-01-08T14:03:27.254-00:00"}' \
     $nodeURL/entity
nil

POST /entity-tx

Takes a key and, optionally, :valid-time and/or :transact-time (defaulting to now). Returns the :put or :cas transaction that most recently set that key at those times.

See Bitemporality for more information.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:eid :foobar}' \
     $nodeURL/entity-tx
{:crux.db/id "8843d7f92416211de9ebb963ff4ce28125932878",
 :crux.db/content-hash "7af0444315845ab3efdfbdfa516e68952c1486f2",
 :crux.db/valid-time #inst "2019-01-08T16:34:47.738-00:00",
 :crux.tx/tx-id 0,
 :crux.tx/tx-time #inst "2019-01-08T16:34:47.738-00:00"}

GET/POST /history/[:key]

Returns the transaction history of a key, from newest to oldest transaction time.

curl -X GET $nodeURL/history/:ivan
[{:crux.db/id "a15f8b81a160b4eebe5c84e9e3b65c87b9b2f18e",
  :crux.db/content-hash "c28f6d258397651106b7cb24bb0d3be234dc8bd1",
  :crux.db/valid-time #inst "2019-01-07T14:57:08.462-00:00",
  :crux.tx/tx-id 14,
  :crux.tx/tx-time #inst "2019-01-07T16:51:55.185-00:00"}

 {...}]

POST /query

Takes a Datalog query and returns its results.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:query {:find [e] :where [[e :last-name "Petrov"]]}}' \
     $nodeURL/query
#{[:boris][:ivan]}

Note that you are able to add :full-results? true to the query map to easily retrieve the source documents relating to the entities in the result set. For instance to retrieve all documents in a single query:

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:query {:find [e] :where [[e :crux.db/id _]] :full-results? true}}' \
     $nodeURL/query

POST /query-stream

Same as /query but the results are streamed.

GET /sync

Wait until the Kafka consumer’s lag is back to 0 (i.e. when it no longer has pending transactions to write). Timeout is 10 seconds by default, but can be specified as a parameter in milliseconds. Returns the transaction time of the most recent transaction.

curl -X GET $nodeURL/sync?timeout=500
#inst "2019-01-08T11:06:41.869-00:00"

GET /tx-log

Returns a list of all transactions, from oldest to newest transaction time.

curl -X GET $nodeURL/tx-log
({:crux.tx/tx-time #inst "2019-01-07T15:11:13.411-00:00",
  :crux.api/tx-ops [[
    :crux.tx/put "c28f6d258397651106b7cb24bb0d3be234dc8bd1"
    #inst "2019-01-07T14:57:08.462-00:00"]],
  :crux.tx/tx-id 0}

 {:crux.tx/tx-time #inst "2019-01-07T15:11:32.284-00:00",
  ...})

POST /tx-log

Takes a vector of transactions (any combination of :put, :delete, :cas and :evict) and executes them in order. This is the only "write" endpoint.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '[[:crux.tx/put {:crux.db/id :ivan, :name "Ivan" :last-name "Petrov"}],
          [:crux.tx/put {:crux.db/id :boris, :name "Boris" :last-name "Petrov"}],
          [:crux.tx/delete :maria  #inst "2012-05-07T14:57:08.462-00:00"]]' \
     $nodeURL/tx-log
{:crux.tx/tx-id 7, :crux.tx/tx-time #inst "2019-01-07T16:14:19.675-00:00"}