A small library for doing UTF-8-based input and output.
Marijn Haverbeke <firstname.lastname@example.org>
Gábor Melis <email@example.com>
# Trivial UTF-8 Manual ###### \[in package TRIVIAL-UTF-8\] ## TRIVIAL-UTF-8 ASDF System - Description: A small library for doing UTF-8-based input and output. - Licence: ZLIB - Author: Marijn Haverbeke <firstname.lastname@example.org> - Maintainer: Gábor Melis <email@example.com> - Homepage: [https://common-lisp.net/project/trivial-utf-8/](https://common-lisp.net/project/trivial-utf-8/) - Bug tracker: [https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues](https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues) - Source control: [GIT](https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8.git) ## Introduction Trivial UTF-8 is a small library for doing UTF-8-based in- and output on a Lisp implementation that already supports Unicode - meaning CHAR-CODE and CODE-CHAR deal with Unicode character codes. The rationale for the existence of this library is that while Unicode-enabled implementations usually do provide some kind of interface to dealing with character encodings, these are typically not terribly flexible or uniform. The [Babel][babel] library solves a similar problem while understanding more encodings. Trivial UTF-8 was written before Babel existed, but for new projects you might be better off going with Babel. The one plus that Trivial UTF-8 has is that it doesn't depend on any other libraries. [babel]: https://common-lisp.net/project/babel/ ## Links Here is the [official repository][trivial-utf-8-repo] and the [HTML documentation][trivial-utf-8-doc] for the latest version. [trivial-utf-8-repo]: https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8 [trivial-utf-8-doc]: http://melisgl.github.io/mgl-pax-world/trivial-utf-8-manual.html ## Reference - [function] UTF-8-BYTE-LENGTH STRING Calculate the amount of bytes needed to encode STRING. - [function] STRING-TO-UTF-8-BYTES STRING &KEY NULL-TERMINATE Convert STRING into an array of unsigned bytes containing its UTF-8 representation. If NULL-TERMINATE, add an extra 0 byte at the end. - [function] UTF-8-GROUP-SIZE BYTE Determine the amount of bytes that are part of the character whose encoding starts with BYTE. May signal UTF-8-DECODING-ERROR. - [function] UTF-8-BYTES-TO-STRING BYTES &KEY (START 0) (END (LENGTH BYTES)) Convert the START, END subsequence of the array of BYTES containing UTF-8 encoded characters to a [STRING][type]. The element type of BYTES may be anything as long as it can be `COERCE`d into an `(UNSIGNED-BYTES 8)` array. May signal UTF-8-DECODING-ERROR. - [function] READ-UTF-8-STRING INPUT &KEY NULL-TERMINATED STOP-AT-EOF (CHAR-LENGTH -1) (BYTE-LENGTH -1) Read UTF-8 encoded data from INPUT, a byte stream, and construct a string with the characters found. When NULL-TERMINATED is given, stop reading at a null character. If STOP-AT-EOF, then stop at END-OF-FILE without raising an error. The CHAR-LENGTH and BYTE-LENGTH parameters can be used to specify the max amount of characters or bytes to read, where -1 means no limit. May signal UTF-8-DECODING-ERROR. - [condition] UTF-8-DECODING-ERROR SIMPLE-ERROR * * * ###### \[generated by [MGL-PAX](https://github.com/melisgl/mgl-pax)\]