cl-bloom

2016-09-29

A simple Common Lisp implementation of Bloom filters with efficient hashing.

To make an empty filter, use MAKE-FILTER, which takes parameters for capacity and false drop (false positive) rate. If no false drop rate is specified, the value of FALSE-DROP-RATE is used. And also the third parameter specify whether the space is allocated from heap or by using static-vectors:make-static-vector.

;;; the space is allocated from heap by default
(defparameter *my-filter*
  (bloom:make-filter :capacity 1000 :false-drop-rate 1/1000))
;;; by setting the third parameter `:static` to T, 
;;; the space will be allocated statically
(defparameter *my-filter*
  (bloom:make-filter :capacity 1000 :false-drop-rate 1/1000 :static t))

Good values for the "order" (size) and "degree" (number of hashes) are calculated internally to obtain the theoretically ideal dimensions for a Bloom filter having the given parameters.

To add an element to a filter, use ADD:

(bloom:add *my-filter* "Add me")

To test for membership, use MEMBERP:

(bloom:memberp *my-filter* "Add me")
=> T

Since when the space is allocated by using static-vectors:make-static-vector, users must explicitly free the space by using static-vectors:free-static-vector. We thus provide two APIs, destroy-filter and with-filter, to help with that:

destroy-filter

(bloom:destroy-filter *filter*) 
;; => a destroyed filter instance, where all slots are being either set to NIL or freed

with-filter

;;; A 'with-' wrapper around filter, pretty useful when the space is allocated statically;
;;; it will free the space 'automatically'.
CL-USER> (bloom:with-filter (filter :capacity 10 :static t)
           (bloom:add filter "add")
           (bloom:add filter "minus")
           (print (bloom:memberp filter "add"))
           (print (bloom:memberp filter "minus")))

T 
T 
; No value

When filters are used as sets, FILTER-UNION, FILTER-NUNION, FILTER-NINTERSECTION, and FILTER-INTERSECTION behave like their namesakes. FILTER-IOR and FILTER-AND are shorthands for lists of filters.

The other utilities for composing filters are MAKE-COMPATIBLE-FILTER, which takes a filter and returns an empty, compatible filter, and COPY-FILTER, which takes a filter and returns an independent copy.

The utility MAKE-SET-FILTER covers one use case for Bloom filters: given a list, it returns a filter suitable for testing membership in that list, considered as a set.

Author
Paul M. Rodriguez <pmr@ruricolist.com>
License
MIT