btrie
2014-07-13
Branch trie - a generic trie implementation with branch widths. * Implementation is generic: keys can be of sequences of any type. * Branch width of a trie node tells how many branches go through that node and can be used to calculate probabilites for different suffixes.
Upstream URL
Author
Peter Hillerström <peter.hillerstrom@gmail.com>
License
Simplified BSD license.
-------------------------------------------------------------------------
Branch trie - a generic trie implementation with branch widths
-------------------------------------------------------------------------
Version: 0.2.1
Initial Common Lisp version: 2010-08-01
Features:
This trie implementation has an original idea of “branch width”
invented by Peter Hillerström on 14 of November 2008. Branch width
of a trie node tells how many branches go through that node.
Widths can be used to calculate probabilites for different suffixes.
Notes about this implementation
* The trie is implemented recursively, so 'trie' can mean the whole
tree or a single node on a trie.
* Generic: Keys can be sequences of any type.
* IMPORTANT: All functions are destructive, for efficiently handling
large data sets. There will be non-destructive versions of functions.
About tries generally:
Trie, or prefix tree, is an ordered tree data structure that is used
to store an associative array where the keys are usually strings.
Unlike a binary search tree, no node in the tree stores the whole key
associated with that node instead, its position in the tree shows
what key it is associated with.
All the descendants of a node have a common prefix of the string
associated with that node, and the root is associated with the empty
string. Looking up a key of length m takes worst case O(m) time.
More information about tries:
http://en.wikipedia.org/wiki/Trie
Todo:
- Removal of sequences
- Merging (union) of several tries and other set operations like intersection
- Sliding window to keep at most n sequences in the trie
- Bit tries using bit-vectors