trivial-utf-8
2023-10-21
A small library for doing UTF-8-based input and output.
Author
Marijn Haverbeke <marijnh@gmail.com>
Maintainer
Gábor Melis <mega@retes.hu>
License
ZLIB
# Trivial UTF-8 Manual
###### \[in package TRIVIAL-UTF-8\]
## TRIVIAL-UTF-8 ASDF System
- Description: A small library for doing UTF-8-based input and output.
- Licence: ZLIB
- Author: Marijn Haverbeke <marijnh@gmail.com>
- Maintainer: Gábor Melis <mega@retes.hu>
- Homepage: [https://common-lisp.net/project/trivial-utf-8/](https://common-lisp.net/project/trivial-utf-8/)
- Bug tracker: [https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues](https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8/-/issues)
- Source control: [GIT](https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8.git)
## Introduction
Trivial UTF-8 is a small library for doing UTF-8-based in- and
output on a Lisp implementation that already supports Unicode -
meaning CHAR-CODE and CODE-CHAR deal with Unicode character codes.
The rationale for the existence of this library is that while
Unicode-enabled implementations usually do provide some kind of
interface to dealing with character encodings, these are typically
not terribly flexible or uniform.
The [Babel][babel] library solves a similar problem while
understanding more encodings. Trivial UTF-8 was written before Babel
existed, but for new projects you might be better off going with
Babel. The one plus that Trivial UTF-8 has is that it doesn't depend
on any other libraries.
[babel]: https://common-lisp.net/project/babel/
## Links
Here is the [official repository][trivial-utf-8-repo] and the
[HTML documentation][trivial-utf-8-doc] for the latest version.
[trivial-utf-8-repo]: https://gitlab.common-lisp.net/trivial-utf-8/trivial-utf-8
[trivial-utf-8-doc]: http://melisgl.github.io/mgl-pax-world/trivial-utf-8-manual.html
## Reference
- [function] UTF-8-BYTE-LENGTH STRING
Calculate the amount of bytes needed to encode STRING.
- [function] STRING-TO-UTF-8-BYTES STRING &KEY NULL-TERMINATE
Convert STRING into an array of unsigned bytes containing its UTF-8
representation. If NULL-TERMINATE, add an extra 0 byte at the end.
- [function] UTF-8-GROUP-SIZE BYTE
Determine the amount of bytes that are part of the character whose
encoding starts with BYTE. May signal UTF-8-DECODING-ERROR.
- [function] UTF-8-BYTES-TO-STRING BYTES &KEY (START 0) (END (LENGTH BYTES))
Convert the START, END subsequence of the array of BYTES containing
UTF-8 encoded characters to a [STRING][type]. The element type of
BYTES may be anything as long as it can be `COERCE`d into
an `(UNSIGNED-BYTES 8)` array. May signal UTF-8-DECODING-ERROR.
- [function] READ-UTF-8-STRING INPUT &KEY NULL-TERMINATED STOP-AT-EOF (CHAR-LENGTH -1) (BYTE-LENGTH -1)
Read UTF-8 encoded data from INPUT, a byte stream, and construct a
string with the characters found. When NULL-TERMINATED is given,
stop reading at a null character. If STOP-AT-EOF, then stop at
END-OF-FILE without raising an error. The CHAR-LENGTH and
BYTE-LENGTH parameters can be used to specify the max amount of
characters or bytes to read, where -1 means no limit. May signal
UTF-8-DECODING-ERROR.
- [condition] UTF-8-DECODING-ERROR SIMPLE-ERROR
* * *
###### \[generated by [MGL-PAX](https://github.com/melisgl/mgl-pax)\]