cl-hamt

2020-03-25

Dictionary & set data structure using hash array-mapped tries

Upstream URL

github.com/danshapero/cl-hamt

Author

Daniel Shapero <shapero.daniel@gmail.com>

License

BSD 3-clause
README

cl-hamt

Build Status

This library provides purely functional dictionaries and sets in Common Lisp based on the hash array-mapped trie data structure. The operations provided are:

size
lookup
insert
remove
reduce
filter
map
eq

The versions for sets and dictionaries are obtained by prepending set- or dict- to the above symbols, so for example to lookup a key in a set and dictionary you would call set-lookup and dict-lookup respectively. An empty collection is created with the functions empty-set and empty-dict.

See the examples/ directory for some usage examples of the library, or the unit tests. Some benchmark code can be found in tests/benchmarks.lisp.

Implementation

The data types in cl-hamt are implemented using the hash array-mapped trie (HAMT) data structure, as found in the Clojure language. HAMTs provide near-constant time search, insertion and removal and can be used as persistent data structures, i.e. all updates are non-destructive.

As the name suggests, hash array-mapped tries use hashing of the underlying data to store and retrieve it efficiently. Consequently, any natural ordering on the data, e.g. lexicographic ordering of strings, natural ordering of integers, is not preserved in a HAMT. When using reduce on a collection, the operation in question must not depend on the order in which the elements are accessed. If the data are ordered and this ordering is important, a self-balancing binary tree may be a more appropriate data structure. Additionally, one must provide an appropriate 32-bit hash function. We default to using murmurhash, as implemented in the Common Lisp package cl-murmurhash. Note that the built-in Common Lisp hash function sxhash is not a 32-bit hash; for example, on my 64-bit system with SBCL, it returns a 62-bit hash.

While most operations on HAMTs have a complexity of log base-32 in the size of the data structure, there is quite a bit of overhead. HAMTs are probably less efficient for repeated operations on small-size sets and dictionaries than, say, a list or an association list.

Dependencies (4)

  • cl-murmurhash
  • cl-ppcre
  • drakma
  • fiveam

Dependents (1)

  • GitHub
  • Quicklisp