cl-editdistance

2018-01-31

edit-distance

Build Status Coverage Status

Using

Computes the edit distance (aka Levenshtein distance) between two sequences. This is a common distance measure.

For example, to compute the distance between two sequences:

(edit-distance:distance '(1 2 3 4 5) '(2 3 4 5 6))

To compute the sequence of edit operations needed to convert sequence 1 into sequence two, use the diff function

(edit-distance:diff '(1 2 3 4 5) '(2 3 4 5 6))

That will return two values a structure, as follows, and the distance.

((:DELETION 1 NIL) (:MATCH 2 2) (:MATCH 3 3) (:MATCH 4 4) (:MATCH 5 5) (:INSERTION NIL 6))
2

That structure can be printed more readibly with the FORMAT-DIFF function

(edit-distance:format-diff *path*)

Or, you can compute the diff and print it readably together by calling PRINT-DIFF:

(edit-distance:print-diff '(1 2 3 4 5) '(2 3 4 5 6))

which will print a result like this:

seq1: 1   2 3 4 5 *** []
seq2: *** 2 3 4 5 6   []

Several options are available to the FORMAT-DIFF and PRINT-DIFF to print a prefix and suffix for each line. Note that displaying substitutions relys on captialization and so substitutions are not visible for non-alphabetic sequence elements.

Additionally, other equality functions can be used, so this evaluates to a distance of zero:

 (edit-distance:distance '("ONE" "TWO" "THREE") '("one" "two"
 "three") :test 'string-equal)

Any type of sequence may be used, but for speed the input sequences are copied into new arrays. If speed is a major concern make sure to provide simple vectors as your input sequences.

Testing

To test with sbcl, run:

sbcl --eval "(asdf:load-system 'edit-distance-test)" --eval "(unless (lisp-unit:run-tests :all :edit-distance-tests) (uiop:quit 1))"

Creative Commons License
cl-edit-distance by Ben Lambert is licensed under a Creative Commons Attribution 4.0 International License.

Author
Ben Lambert <belambert@mac.com>
License
CC-BY-4.0