Open Time Store™
Copyright © JUXT LTD 2018-2019

Javadocs

Please consult the Javadocs for the official Crux API.

REST

Introduction

Crux offers a small REST API that allows you to send transactions and run queries over HTTP. For instance, you could deploy your Crux nodes along with Kafka into a Kubernetes pod running on AWS and interact with Crux from your application purely via HTTP. Using Crux in this manner is a valid use-case but it cannot support all of the features and benfits that running the Crux node inside of your application provides.

Your application only needs to communicate with one Crux node when using the REST API. This Crux node may placed be behind a load balancer which spreads the load over multiple nodes transparently to application. In addition, different Crux nodes might be still be catching up with the head of the transaction log, and since different queries might go to different nodes, you have to be conscious of read consistency issues when designing your application to use Crux in this way. Fortunately, you can mitigate read consistency issues the ability to query consistent point-in-time snapshots of the database by specifying temporal coordinates along with your queries.

The REST API also provides an experimental endpoint for SPARQL 1.1 Protocol queries under /sparql/, rewriting the query into the Crux Datalog dialect. Only a small subset of SPARQL is supported and no other RDF features are available.

Using the HTTP API

The HTTP interface is provided as a Ring middleware in a Clojure namespace, located at crux/crux-http-server/src/crux/http_server.clj. There is an example of using this middleware in a full example HTTP server configuration: https://github.com/juxt/crux/tree/master/docs/example/standalone_webservice

Whilst CORS may be easily configured for use while prototyping a Single Page Application that uses Crux directly from a web browser, it is currently NOT recommended to expose Crux directly to any untrusted endpoints (including web browsers) in production since the default query API does not sandbox or otherwise restrict the execution of queries.

Index

Table 1. API
uri method description

/

GET

returns various details about the state of the database

/document/[content-hash]

GET or POST

returns the document for a given hash

[documents /documents]

POST

returns a map of document ids and respective documents for a given set of content hashes submitted in the request body

/entity

POST

Returns an entity for a given ID and optional valid-time/transaction-time co-ordinates

/entity-tx

POST

Returns the :put or :cas transaction that most recently set a key

/history/[:key]

GET OR POST

Returns the transaction history of a key

/query

POST

Takes a datalog query and returns its results

/query-stream

POST

Same as /query but the results are streamed

/sync

GET

Wait until the Kafka consumer’s lag is back to 0

/tx-log

GET

Returns a list of all transactions

/tx-log

POST

The "write" endpoint, to post transactions.

GET /

Returns various details about the state of the database. Can be used as a health check.

curl -X GET $nodeURL/
{:crux.kv/kv-backend "crux.kv.rocksdb.RocksKv",
 :crux.kv/estimate-num-keys 92,
 :crux.kv/size 72448,
 :crux.zk/zk-active? true,
 :crux.tx-log/consumer-state
   {:crux.kafka.topic-partition/crux-docs-0
      {:offset 25,
       :time #inst "2019-01-08T11:06:41.867-00:00",
       :lag 0},
    :crux.kafka.topic-partition/crux-transaction-log-0
      {:offset 19,
       :time #inst "2019-01-08T11:06:41.869-00:00",
       :lag 0}}}
Note
estimate-num-keys is an (over)estimate of the number of transactions in the log (each of which is a key in RocksDB). RocksDB does not provide an exact key count.

GET/POST /document/[content-hash]

Returns the document stored under that hash, if it exists.

curl -X GET $nodeURL/document/7af0444315845ab3efdfbdfa516e68952c1486f2
{:crux.db/id :foobar, :name "FooBar"}
Note
Hashes for older versions of a document can be obtained with /history-range or /history, under the :crux.db/content-hash keys.

GET/POST /documents

Returns a map from the documents ids to the documents for ids set. Possible to get map keys as #crux/id literals if preserve-crux-ids param is set to "true"

curl -X POST $nodeURL/documents \
     -H "Content-Type: application/edn" \
     -d '#{"7af0444315845ab3efdfbdfa516e68952c1486f2"}'
{"7af0444315845ab3efdfbdfa516e68952c1486f2" {:crux.db/id :foobar, :name "FooBar"}}

POST /entity

Takes a key and, optionally, a :valid-time and/or :transact-time (defaulting to now). Returns the value stored under that key at those times.

See Bitemporality for more information.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:eid :tommy}' \
     $nodeURL/entity
{:crux.db/id :tommy, :name "Tommy", :last-name "Petrov"}
curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:eid :tommy :valid-time #inst "1999-01-08T14:03:27.254-00:00"}' \
     $nodeURL/entity
nil

POST /entity-tx

Takes a key and, optionally, :valid-time and/or :transact-time (defaulting to now). Returns the :put or :cas transaction that most recently set that key at those times.

See Bitemporality for more information.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:eid :foobar}' \
     $nodeURL/entity-tx
{:crux.db/id "8843d7f92416211de9ebb963ff4ce28125932878",
 :crux.db/content-hash "7af0444315845ab3efdfbdfa516e68952c1486f2",
 :crux.db/valid-time #inst "2019-01-08T16:34:47.738-00:00",
 :crux.tx/tx-id 0,
 :crux.tx/tx-time #inst "2019-01-08T16:34:47.738-00:00"}

GET/POST /history/[:key]

Returns the transaction history of a key, from newest to oldest transaction time.

curl -X GET $nodeURL/history/:ivan
[{:crux.db/id "a15f8b81a160b4eebe5c84e9e3b65c87b9b2f18e",
  :crux.db/content-hash "c28f6d258397651106b7cb24bb0d3be234dc8bd1",
  :crux.db/valid-time #inst "2019-01-07T14:57:08.462-00:00",
  :crux.tx/tx-id 14,
  :crux.tx/tx-time #inst "2019-01-07T16:51:55.185-00:00"}

 {...}]

POST /query

Takes a Datalog query and returns its results.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:query {:find [e] :where [[e :last-name "Petrov"]]}}' \
     $nodeURL/query
#{[:boris][:ivan]}

Note that you are able to add :full-results? true to the query map to easily retrieve the source documents relating to the entities in the result set. For instance to retrieve all documents in a single query:

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '{:query {:find [e] :where [[e :crux.db/id _]] :full-results? true}}' \
     $nodeURL/query

POST /query-stream

Same as /query but the results are streamed.

GET /sync

Wait until the Kafka consumer’s lag is back to 0 (i.e. when it no longer has pending transactions to write). Timeout is 10 seconds by default, but can be specified as a parameter in milliseconds. Returns the transaction time of the most recent transaction.

curl -X GET $nodeURL/sync?timeout=500
#inst "2019-01-08T11:06:41.869-00:00"

GET /tx-log

Returns a list of all transactions, from oldest to newest transaction time.

curl -X GET $nodeURL/tx-log
({:crux.tx/tx-time #inst "2019-01-07T15:11:13.411-00:00",
  :crux.api/tx-ops [[
    :crux.tx/put "c28f6d258397651106b7cb24bb0d3be234dc8bd1"
    #inst "2019-01-07T14:57:08.462-00:00"]],
  :crux.tx/tx-id 0}

 {:crux.tx/tx-time #inst "2019-01-07T15:11:32.284-00:00",
  ...})

POST /tx-log

Takes a vector of transactions (any combination of :put, :delete, :cas and :evict) and executes them in order. This is the only "write" endpoint.

curl -X POST \
     -H "Content-Type: application/edn" \
     -d '[[:crux.tx/put {:crux.db/id :ivan, :name "Ivan" :last-name "Petrov"}],
          [:crux.tx/put {:crux.db/id :boris, :name "Boris" :last-name "Petrov"}],
          [:crux.tx/delete :maria  #inst "2012-05-07T14:57:08.462-00:00"]]' \
     $nodeURL/tx-log
{:crux.tx/tx-id 7, :crux.tx/tx-time #inst "2019-01-07T16:14:19.675-00:00"}

Clojure

(ns crux.api)

crux.api exposes a union of methods from ICruxNode and ICruxDatasource, with few lifecycle members added.

ICruxNode

db
  (db
    [node]
    [node ^Date valid-time]
    [node ^Date valid-time ^Date transaction-time]
    "Will return the latest value of the db currently known. Non-blocking.

     When a valid time is specified then returned db value contains only those
     documents whose valid time is not after the specified. Non-blocking.

     When both valid and transaction time are specified returns a db value
     as of the valid and transaction time. Will block until the transaction
     time is present in the index.")
document
  (document [node content-hash]
    "Reads a document from the document store based on its
    content hash.")
history
  (history [node eid]
    "Returns the transaction history of an entity, in reverse
    chronological order. Includes corrections, but does not include
    the actual documents.")

history-range

  (history-range [node eid
                  ^Date valid-time-start
                  ^Date transaction-time-start
                  ^Date valid-time-end
                  ^Date transaction-time-end]
    "Returns the transaction history of an entity, ordered by valid
    time / transaction time in chronological order, earliest
    first. Includes corrections, but does not include the actual
    documents.

    Giving null as any of the date arguments makes the range open
    ended for that value.")

status

  (status [node]
    "Returns the status of this node as a map.")

submit-tx

  (submit-tx [node tx-ops]
    "Writes transactions to the log for processing
     tx-ops datalog style transactions.
     Returns a map with details about the submitted transaction,
     including tx-time and tx-id.")

submitted-tx-updated-entity?

  (submitted-tx-updated-entity? [node submitted-tx eid]
    "Checks if a submitted tx did update an entity.
    submitted-tx must be a map returned from `submit-tx`
    eid is an object that can be coerced into an entity id.
    Returns true if the entity was updated in this transaction.")

submitted-tx-corrected-entity?

  (submitted-tx-corrected-entity? [node submitted-tx ^Date valid-time eid]
    "Checks if a submitted tx did correct an entity as of valid time.
    submitted-tx must be a map returned from `submit-tx`
    valid-time valid time of the correction to check.
    eid is an object that can be coerced into an entity id.
    Returns true if the entity was updated in this transaction.")

sync

  (sync
    [node ^Duration timeout]
    [node ^Date transaction-time ^Duration timeout]
    "If the transaction-time is supplied, blocks until indexing has
    processed a tx with a greater-than transaction-time, otherwise
    blocks until the node has caught up indexing the tx-log
    backlog. Will throw an exception on timeout. The returned date is
    the latest index time when this node has caught up as of this
    call. This can be used as the second parameter in (db valid-time,
    transaction-time) for consistent reads.
    timeout – max time to wait, can be null for the default.
    Returns the latest known transaction time.")

new-tx-log-context

  (new-tx-log-context ^java.io.Closeable [node]
    "Returns a new transaction log context allowing for lazy reading
    of the transaction log in a try-with-resources block using
    (tx-log ^Closeable tx-Log-context, from-tx-id, boolean with-documents?).

    Returns an implementation specific context.")

tx-log

  (tx-log [node tx-log-context from-tx-id with-documents?]
    "Reads the transaction log lazily. Optionally includes
    documents, which allow the contents under the :crux.api/tx-ops
    key to be piped into (submit-tx tx-ops) of another
    Crux instance.
    tx-log-context  a context from (new-tx-log-context node)
    from-tx-id      optional transaction id to start from.
    with-documents? should the documents be included?

    Returns a lazy sequence of the transaction log.")

attribute-stats

  (attribute-stats [node]
    "Returns frequencies of indexed attributes")

ICruxDatasource

Represents the database as of a specific valid and transaction time.

entity

  (entity [db eid]
    "queries a document map for an entity.
    eid is an object which can be coerced into an entity id.
    returns the entity document map.")

entity-tx

  (entity-tx [db eid]
    "returns the transaction details for an entity. Details
    include tx-id and tx-time.
    eid is an object that can be coerced into an entity id.")

new-snapshot

  (new-snapshot ^java.io.Closeable [db]
     "Returns a new implementation specific snapshot allowing for lazy query results in a
     try-with-resources block using (q db  snapshot  query)}.
     Can also be used for
     (history-ascending db snapshot  eid) and
     (history-descending db snapshot  eid)
     returns an implementation specific snapshot")

q

  (q
    [db query]
    [db snapshot query]
    "q[uery] a Crux db.
    query param is a datalog query in map, vector or string form.
    First signature will evaluate eagerly and will return a set or vector
    of result tuples.
    Second signature accepts a db snapshot, see `new-snapshot`.
    Evaluates *lazily* consequently returns lazy sequence of result tuples.")

history-ascending

  (history-ascending
    [db snapshot eid]
    "Retrieves entity history lazily in chronological order
    from and including the valid time of the db while respecting
    transaction time. Includes the documents.")

history-descending

  (history-descending
    [db snapshot eid]
    "Retrieves entity history lazily in reverse chronological order
    from and including the valid time of the db while respecting
    transaction time. Includes the documents.")

valid-time

  (valid-time [db]
    "returns the valid time of the db.
    If valid time wasn't specified at the moment of the db value retrieval
    then valid time will be time of the latest transaction.")

transaction-time

  (transaction-time [db]
    "returns the time of the latest transaction applied to this db value.
    If a tx time was specified when db value was acquired then returns
    the specified time."))

Lifecycle members

start-cluster-node

(defn start-cluster-node ^ICruxAPI [options])

Starts a query node in local library mode.

For valid options, see crux.bootstrap/cli-options. Options are specified as keywords using their long format name, like :bootstrap-servers etc.

Note
requires any KV store dependencies and kafka-clients on the classpath. The crux.kv.memdb.MemKv KV backend works without additional dependencies.

The HTTP API can be started by passing the node to crux.http-server/start-http-server. This will require further dependencies on the classpath, see crux.http-server for details.

Options:

{:kv-backend        "crux.kv.rocksdb.RocksKv" ; requires RocksDB as dep
                    "crux.kv.memdb.MemKv" ; will work without additional deps
:bootstrap-servers  "kafka-cluster-kafka-brokers.crux.svc.cluster.local:9092"
:event-log-dir      "data/eventlog-1"
:db-dir             "data/db-dir-1"
:backup-dir         "checkpoint"
:group-id           "group-id"
:tx-topic           "crux-transaction-log"
:doc-topic          "crux-docs"
:create-topics      true
:doc-partitions     1
:replication-factor 1
:db-dir             "data"
:server-port        3000
:await-tx-timeout   10000
:doc-cache-size     131072
:object-store       "crux.index.KvObjectStore"}

Returns the started local node that implements ICruxAPI and java.io.Closeable. Latter allows the node to be stopped by calling (.close node).

Throws IndexVersionOutOfSyncException if the index needs rebuilding.

start-standalone-node

(defn start-standalone-node ^ICruxAPI [options])

Creates a minimal standalone node writing the transaction log into its local KV store without relying on Kafka. Alternatively, when the event-log-dir option is provided, using two KV stores to enable rebuilding the index from the event log, being more similar to the semantics of Kafka but for a single process only.

Note
requires any KV store dependencies on the classpath. The crux.kv.memdb.MemKv KV backend works without additional dependencies.

Options:

{:kv-backend    "crux.kv.rocksdb.RocksKv" ; or crux.kv.memdb.MemKv
 :event-log-dir "data/eventlog-1"
 :db-dir        "data/db-dir-1"
 :backup-dir    "checkpoint"}

See start-cluster-node doc for more options

Returns a standalone node which implements ICruxAPI and java.io.Closeable. Latter allows the node to be stopped by calling (.close node).

Throws IndexVersionOutOfSyncException if the index needs rebuilding.

Throws NonMonotonicTimeException if the clock has moved backwards since last run. Only applicable when using the event log.

new-api-client

(defn new-api-client ^ICruxAPI [url])

Creates a new remote API client ICruxAPI. The remote client requires valid and transaction time to be specified for all calls to db.

Note
requires either clj-http or http-kit on the classpath, see crux.bootstrap.remove-api-client/internal-http-request-fn for more information.

Param url the URL to a Crux HTTP end-point.

Returns a remote API client.