With Datomic, for most purposes, using datalog is recommended.
However this is not always satisfactory, especially if you have a memory constrained system or billions of datoms to choose from, with a need for lazy processing.
In these situations you are sometimes forced to use the lower level
A datom is the basic unit of storage in datomic. It is a tuple of 4 parts
just a number
describes the datatype of the value and its cardinality. Has a keyword that hopefully describes the purpose of the association
can be any of the scalar datatypes described in https://docs.datomic.com/on-prem/schema.html#required-schema-attributes
reference to the transaction that asserted this datom.
Datomic keeps 4 internal indexes with datoms sorted by increasing value
holds all datoms, sorted by entityId then Attribute EntityId then Value then Transaction Entity ID
holds all datoms, sorted by Attribute EntityId then EntityId then Value then Transaction Entity ID
holds datoms whose Value is configured as unique and those configured with Index=true, sorted by Atttribute then Value then Entity then Transaction
holds only datoms whose scalar type is Reference. Sorted by Reference then Attribute then Entity then Transaction
The d/datoms function call requires you to specify one of these indexes and optionally some matching params.
It will return an iterable that lazily supplies (matching) datoms sorted by increasing value
Since we know that the datoms are sorted we can write a transducer to
keep only those entityIds that appear in both datom iterables
reject only those entityIds that appear in both iterables.
Traversing lazily-realised iterables drastically reduces memory overhead thereby increasing performance.
Since these are transducers we can daisy-chain them together into arbitrary complexity.
For example, we had a problem running this Datalog query in reasonable time;
Give me the number of entities with an
:asset/type that are not referred to by a
(d/q '[:find (count ?asset) :where [?asset :asset/type] [(missing? $ ?asset :asset/disabled-at)] (not [_ :sync/asset ?a])] datomic-db)
The insight is that we can re-express this as:
Get me all entities with an
(d/datoms db :aevt :asset/type)
Throw away the ones that have an :asset/disabled-at
(d/datoms db :aevt :asset/disabled-at)
Throw away the ones referenced by :sync/asset
(d/datoms db :avet :sync/asset)
Count the number that remain.
The code for this is as follows:
(transduce (comp (remove-from-orderedcollection-xform #(- (.v ^Datom %1) (.e ^Datom %2)) (d/datoms db :avet :sync/asset)) (remove-from-orderedcollection-xform #(- (.e ^Datom %1) (.e ^Datom %2)) (d/datoms db :aevt :asset/disabled-at))) (completing (fn [acc _] (inc acc))) 0 (d/datoms db :aevt :asset/type))
In tests, taking this approach brought the delinquent datalog query down from ~300 seconds to ~900ms.
This part gets somewhat involved. Luckily you only have to write it once and then it is reusable.
You pass in a comparator and a (probably lazy)
It returns a transducer that will filter out data items that can be found in
As long as the items are sorted in a way that is compatible with the
(defn remove-from-orderedcollection-xform "takes two collections values ASCENDING ACCORDING TO cmp FUNCTION eg Iterables of datomic entity Ids and will permit the transducer values that are not present in collR cmp will always be called with collR items as first arg, collVal as second arg only collVal items are returned. The two colls can have different kinds of data as long as cmp can work on them" [cmp ^Iterable collR] (fn [rf] (let [iterR (.iterator collR) nextR (volatile! nil)] (fn ( (rf)) ([acc] (rf acc)) ([acc trans-val] (transduce (fn [rf] (fn ( (rf)) ([acc] ;; if nextR is set, then we are at the end of a sequence of matches ;; and test-val was also a match. Therefore we don't want it. ;; if nextR is NOT set then we have run out of collR values ;; and we want EVERYTHING from NOW ON (if (some? @nextR) acc (rf acc trans-val))) ([acc remove-val] (let [c (cmp remove-val trans-val)] (cond ;; test-val is ahead of remove-val; we have not yet caught up (neg? c) acc ;; test-val equals remove-val; we have caught up ;; we do not want this test-val (zero? c) (do (vreset! nextR remove-val) (reduced acc)) ;; remove-val has raced ahead of test-val. ;; we want this test-val ;; We want to keep this remove-val whilst test-val catches up (pos? c) (do (vreset! nextR remove-val) (reduced (rf acc trans-val)))))))) rf acc (let [r @nextR] (vreset! nextR nil) (iterator-reducible r iterR))))))))
You can try this out by running something like:
(deftest test-remove-from-orderedcollection-easy (testing "easy non-trivial ordered removal" (is (= [0 2 4 6 8] (into  (remove-from-orderedcollection-xform compare (filter odd? (range 12))) (range 10))))))
Oh that’s a bit easier. This does the trick:
(defn keep-orderedcoll-xform [cmp ^Iterable collK] (fn [rf] (let [iterK (.iterator collK) nextK (volatile! nil)] (fn ( (rf)) ([acc] (rf acc)) ([acc trans-val] (transduce (fn [rf] (fn ( (rf)) ([acc] acc) ([acc keep-val] (let [c (cmp keep-val trans-val)] (cond ;; trans-val is ahead of keep-val; ;; keep-val needs to catch up so stay in this inner loop (neg? c) acc ;;test-val equals keep-val; we want this one (zero? c) (do (vreset! nextK nil) (reduced (rf acc trans-val))) ;; keep-val has overtaken trans-val ;; stash this keep-val and break out of inner loop (pos? c) (do (vreset! nextK keep-val) (reduced acc)) ))))) rf acc (let [k @nextK] (vreset! nextK nil) (iterator-reducible k iterK))))))))
Yes, I forget to specify one of the utility functions; shim an
Iterable onto a
I have avoided iterator-seq because I am worried about the following:
Sequence chunking could drain the Iterator prematurely and cause data loss.
Head retention could cause unacceptable memory usage; by definition, datomic iterables can be extremely long.
Sequences construct temporary Cons Objects, creating more memory churn than a simple loop in a reducible.
Our goal here is to be as performant as possible.
(defn iterator-reducible "expresses an iterator through the medium of IReduceInit if first-val is nil it will be ignored" [first-val ^java.util.Iterator it] (reify IReduceInit (reduce [_ f start] (loop [acc (if first-val (f start first-val) start)] (if (or (reduced? acc) (not (.hasNext it))) (unreduced acc) (recur (f acc (.next it))))))))
only gives you the contiguous datoms that match the arguments you pass.
If you pass it a value that does not exist, you will get an empty
This is awkward when you want to discover the datoms whose value is a date that is after a pivot date.
Datomic offers a seek-datoms function to cover this; it returns an Iterable that starts at the passed value, or the next value if it did not exist and will then keep going and going.