uax-15

2024-10-12

Common lisp implementation of Unicode normalization functions :nfc, :nfd, :nfkc and :nfkd (Uax-15)

Upstream URL

github.com/sabracrolleton/uax-15

Author

Takeru Ohta, Sabra Crolleton <sabra.crolleton@gmail.com>, Herbert Snorrason

License

MIT
README

uax-15

Updated for Unicode 15

This package provides a common lisp unicode normalization function using nfc, nfd, nfkc and nfkd as per Unicode Standard Annex #15 found at http://www.unicode.org/reports/tr15/tr15-22.html.

This is a fork of a subset of work done by Takeru Ohta in 2010. Future work is intended to provide support for https://tools.ietf.org/html/rfc8264 and https://tools.ietf.org/html/rfc7564.

Implementation Notes

This has been successfully tested on sbcl, ccl, ecl, clisp, abcl, allegro and cmucl against the unicode test file found at http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt

Usage

It has one major exported function:

  • (normalize (str unicode-normalization-method))

The currently supported normalization methods are :nfc :nfkc :nfd :nfkd

Normalization example with reference to relevant xkcd https://www.xkcd.com/936/

    (normalize "正しい馬バッテリーステープル" :nfkc)
    "正しい馬バッテリーステープル"

    (normalize "الحصان الصحيح البطارية التيلة" :nfkc)
    "الحصان الصحيح البطارية التيلة"

    (normalize "اstáplacha ceart ceallraí capall" :nfkc)
    "اstáplacha ceart ceallraí capall"

To Do list

More relevant xkcd https://xkcd.com/1726/, https://xkcd.com/1953/, https://www.xkcd.com/1209/, https://xkcd.com/1137/

Data Files

Other References

Dependencies (3)

  • cl-ppcre
  • parachute
  • split-sequence

Dependents (1)

  • GitHub
  • Quicklisp