mk-string-metrics
2018-01-31
efficient implementations of various string metric algorithms
mk-string-metrics
This library implements efficient algorithms that calculate various string metrics in Common Lisp:
- Damerau-Levenshtein distance
- Hamming distance
- Jaccard similarity coefficient
- Jaro distance
- Jaro-Winkler distance
- Levenshtein distance
- Normalized Damerau-Levenshtein distance
- Normalized Levenshtein distance
- Overlap coefficient
Installation
Copy files of this library in any place where ASDF can find them. Then you can use it in system definitions and ASDF will take care of the rest.
Via Quicklisp (recommended):
(ql:quickload "mk-string-metrics")
Documentation
damerau-levenshtein x y
Calculate Damerau-Levenshtein distance between two given strings x
and
y
.
hamming x y
Calculate Hamming distance between two given strings x
and y
, they have
to be of the same length.
jaccard x y
Calculate Jaccard similarity coefficient for two strings x
and y
.
Returned value is in range from 0
(no similarity) to 1
(exact match).
jaro x y
Calculate Jaro distance between two strings x
and y
. Returned value is
in range from 0
(no similarity) to 1
(exact match).
jaro-winkler x y
Calculate Jaro-Winkler distance between two strings x
and y
. Returned
value is in range from 0
(no similarity) to 1
(exact match).
levenshtein x y
Calculate Levenshtein distance between two given strings x
and y
.
norm-damerau-levenshtein x y
Return normalized Damerau-Levenshtein distance between x
and y
. Result
is a real number from 0
to 1
, where 0
signifies no similarity between
the strings, while 1
means exact match.
norm-levenshtein x y
Return normalized Levenshtein distance between x
and y
. Result is a real
number from 0
to 1
, where 0
signifies no similarity between the
strings, while 1
means exact match.
overlap x y
This function calculates overlap coefficient between two given strings x
and y
. Returned value is in range from 0
(no similarity) to 1
(exact
match).
License
Copyright © 2014–2018 Mark Karpov
Distributed under MIT License.