Create an unbundled database with Crux and Confluent Cloud Kafka in 5 minutes

In this post I will show you how to get going with Crux and Confluent Cloud to create a highly-scalable unbundled database. Crux provides an immutable document model on top of Apache Kafka with bitemporal queries (using Datalog) and efficient data eviction. Assuming you have a working knowledge of Clojure, all you need to follow along is 5 minutes and a valid payment method for monthly billing.

confluent cloud crux

Note that Crux supports equivalent HTTP and Java APIs, but this post is focussed on Clojure. Also, Crux nodes may either be embedded in your application instances or made available as a more traditional load-balanced cluster. See the docs for more information about the various deployment options.

Confluent Cloud

Confluent is the primary steward of Apache Kafka and they provide, amongst other enterprise offerings, a fully-managed Kafka service via Confluent Cloud. The latest update to Confluent Cloud is particularly compelling for small Crux deployments because there are no minimum fees and the pricing structure is very simple:

Monthly Cost = Data In + Data Out + Data Retained

This is a significant milestone in the world of Kafka-as-a-Service (KaaS). There is no longer a need to think about "brokers" or other infrastructure costs. You only pay for what you use and there are no upfront costs or termination fees. The service can be deployed into your choice of GCP/AWS/Azure regions and it scales elastically for up to 100Mbps cluster throughput.

For example, a modest 20GB Crux database with 5GB ingress and 100GB egress would currently cost as little as $13.55 / month in the cheapest GCP region (based on 20*0.10 + 5*0.11 + 100*0.11).

Steps

0. Sign in to Confluent Cloud and create a cluster

Follow the short sequence of sign-up steps to create an account if you don’t already have one: https://www.confluent.io/confluent-cloud/

Once you have to accessed your default environment you can create a cluster. You will need to choose a name (e.g. crux-1), cloud provider (e.g. GCP) and region (e.g. London). Then you need to provide a valid credit/debit card in order to create the cluster.

1. Create an API key

Under "CLI and client configuration" click on the "Java client" tab then "Create Kafka Cluster API key & secret". Tick the "I have saved my API key and secret and am ready to continue" checkbox and press continue. Your key and secret have now been embedded in a configuration snippet on the page. Copy the snippet and save it as a .properties file in a safe location that will be accessible from your Crux REPL (e.g. ~/cc-config.properties).

2. Start a Clojure REPL

Feel free to look through the Crux tutorial or refer to the documentation if you are trying to understand the Crux APIs.

If you have lein installed you can clone the repo, navigate to crux-dev and run lein repl. Or alternatively you can launch a clj REPL and provide the latest crux-kafka on Clojars using:

clj -Sdeps '{:deps {juxt/crux-kafka {:mvn/version "RELEASE"}}}'

Update the various configuration values in the snippet below according to the inline comments (for :bootstrap-servers and :kafka-properties-file in particular) and run the code in your REPL. This will create a Crux node that is connected to your Confluent Cloud cluster.

(require '[crux.api :as crux])
(import (crux.api ICruxAPI))

(def ^crux.api.ICruxAPI node
  (crux/start-cluster-node
   {:kv-backend "crux.kv.memdb.MemKv" ; use RocksDB or LMDB for production deployments
    :tx-topic "tx-1" ; choose your tx-topic name
    :doc-topic "doc-1" ; choose your doc-topic name
    :replication-factor 3 ; Confluent Cloud requires this to be `3`
    :doc-partitions 6 ; Confluent Cloud default
    :bootstrap-servers "" ; update with the `bootstrap.servers` value found in your properties file
    :kafka-properties-file "path/to/my-kafka.properties"})) ; update with the path of your properties file

Submit a transaction:

(def my-doc {:crux.db/id :some-id
             :color "red"})

(crux/submit-tx node [[:crux.tx/put my-doc]]) ; returns a transaction result map

Retrieve the document:

(crux/entity (crux/db node) :some-id) ; returns my-doc

You could even try this from a second REPL with a second node connecting to the same cluster.

Note that Crux will automatically generate topics with the required retention/compaction configurations and will set the number of partitions for the transaction topic to 1.

You can also create and manage topics independently of Crux using a CLI tool or the Confluent Cloud web interface, but they will need to be configured appropriately (see kafka.clj).

3. Finished

Congratulations! You now have a Crux infrastructure fit for production applications. However, if you would still prefer to use a regular JDBC database (such as SQLite, Postgres or Oracle) instead of Kafka then you may want to take a look at the crux-jdbc module.

In addition to Confluent Cloud’s unique pricing and multi-cloud model you also get access to many interesting non-Apache features. The standard service is probably good enough for most small-scale users of Crux as it stands, however Confluent Cloud Enterprise offers a number of additional features for large-scale and mission-critical deployments including >100Mbps throughput, multi-zone high availability and ACLs.

Looking Ahead

crux heart kafka

A common perception of Apache Kafka is that it is not viable for serious consideration unless you have a large enough problem to warrant the non-trivial effort of introducing it into your organisation. Kafka is known primarily for its suitability within large enterprises and web-scale startups. However, with the rise of Confluent Cloud and other commodity KaaS offerings it seems inevitable that perceptions of the broader market will shift, demand will increase, and KaaS prices will be driven down towards the price floors of more common forms of cloud storage. I look forward to seeing a truly cross-cloud service emerge that optimises the use of low-cost tiered storage for infinite retention (a key requirement for Crux).

In summary, Apache Kafka has never been easier to get started with, whether running it yourself or otherwise, and I strongly suspect that Confluent will continue on its meteoric trajectory. This is all great news for Crux.

crux confluent

Our official support channel is Zulip, but most people appear in the #crux channel on the Clojurians slack. You can also reach us via crux@juxt.pro.

Read more
crux   radar  
Nov 25, 2019
jdt
by Jeremy Taylor
crux   clojure  
Oct 24, 2019
Crux Development Diary Getting into Beta
jon
by Jon Pither
crux   clojure  
Sep 04, 2019
Introducing Crux JDBC Alternative to Kafka
jon
by Jon Pither