decompress
2023-06-18
No Description
Upstream URL
Author
Maintainer
License
1Overview
:PROPERTIES::CUSTOM_ID: overview:END:The title speaks for itself. I originally wrote the Deflate part of this library as a fun exercise in implementing something purely from the specification, but it ended up good enough to publish. Patching existing libraries would have been difficult due to different API promises.
More formats may be added later; obvious candidates are xz and zstd. A compressor counterpart is highly unlikely. A download can be found here; there is also a Github mirror if you're into that.
1.1Features
:PROPERTIES::CUSTOM_ID: features:END:- Supported formats: Deflate, zlib, gzip, bzip2.
- Safety: All edge cases are properly detected and explicitlyhandled; malformed data is rejected early. The interface is designed to behard to misuse; in particular, defaults try to provide the strongestguarantees and built-in integrity checks are run without user intervention.Memory usage is at most linear (and usually constant) with respect to inputsize. No FFI is used anywhere.
- No overreads by default: If the compressed data is well-formed and takenfrom a byte stream, then no byte beyond the end of the compressed data will beread, allowing this library to be used as a part of complicated streamprocessing for other formats. This can be explicitly relaxed in the name ofperformance; see the
allow-overreads-p
option to decompressand make-decompression-stream. - Performance: To my knowledge, this is the second fastest Deflate decoder andthe fastest bzip2 decoder in pure Common Lisp. Under SBCL, file-to-filedecompression usually hits 1.5x-1.7x the runtime of GNU gzip, 2.2x-3.3x theruntime of zlib, and 1.3x-1.5x the runtime of bzip2. It is some 1.7x-2.4xfaster than chipz for Deflate and 1.3x-1.7x faster for bzip2.See the performance section for more details.
1.2Quickstart
:PROPERTIES::CUSTOM_ID: quickstart:END:See the reference for more details.
2Comparison to other libraries
:PROPERTIES::CUSTOM_ID: comparison:END:The other libraries under consideration are chipz, which is probably the most popular library; an implementation by Pierre Mai that is compact and notably used by Quicklisp; and 3bz, which is notably used by pngload.
semz.decompress-1.1.0 | chipz-20230418 | deflate-1.0.1 (Mai) | 3bz-20230408 | |
---|---|---|---|---|
/ | < | |||
supported formats | deflate, zlib, gzip, bzip2 | deflate, zlib, gzip, bzip2 | deflate, zlib, gzip | deflate, zlib, gzip |
dependencies | alexandria, trivial‑gray‑streams | alexandria, trivial‑features, nibbles, babel, cffi¹, mmap¹ | ||
license | MIT | BSD‑3 | MIT + BSD‑3's no-advertising clause | MIT |
input sources | vector, stream | simple specialized vector, list, stream, pathname | stream | (s. s.) vector, stream, foreign memory |
output destinations | new vector | new vector, provided (s. s.) vector, stream, pathname | stream | (s. s.) vector |
Gray stream wrapper | yes | only around other streams | no | no |
checksum verification | automatic | automatic | automatic if enabled, disabled by default | automatic |
edge case handling | very strict | lenient | lenient, some oversights | strict, no library error type |
partial input | library error | library error | end-of-file error |
status flag |
trailing data | ignored, not read by default | ignored, may be read | ignored, never read | ignored; not read with stream input |
preset zlib dictionaries | supported | unsupported | unsupported | unsupported |
bzip2 block randomization | supported | unsupported | n/a | n/a |
¹ Optional depending on implementation.
2.1Edge case handling
:PROPERTIES::CUSTOM_ID: edge-cases:END:2.1.1Deflate
:PROPERTIES::CUSTOM_ID: edge-cases-deflate:END:The Deflate spec has a few rough edges which we deal with as follows.
- Some decompressors allow back references to go beyond the beginning of theoutput data, pretending that the octets out of bounds were all zeroes. Thisone always signals an error.
- Huffman trees are uniquely determined by the associated sequence of codelengths, but not every such sequence corresponds to a tree - some leave codesunassigned and some make codes ambiguous. The spec isn't clear on what to dowith such sequences (outside of two special cases in §3.2.7). While one canargue that leaving codes unassigned is fine as long as they don't appear, wealways signal an error as soon as a sequence doesn't correspond to a(complete) Huffman tree.
- According to §3.2.7, the situation where only a single distance code is usedmay be represented by having only one code length of one. We enforce that theunique distance code is always encoded as a zero bit and never as a one bit.
- A dynamic Deflate block can omit the encoding for the end-of-block marker. Iam convinced that this is an oversight in the standard since the onlysituation where an encoder would want to output such a block is if it knows inadvance the input is infinite and that the miniscule inefficiency of encodingan EOB would be worse than the inefficiency that arises from being unable tochange the Huffman tree (no EOB means no new dynamic block).
However, there's little reason to ban this; since Deflate already allows blocks of arbitrary length, nothing stops a stream from being infinite anyway and a finite input is going to run into EOF either way.
- Dynamic Deflate blocks can provide encodings for the illegal literal/lengthcodes 286 and 287 as well as the illegal distance codes 30 and 31, althoughthese may never occur in the data. We allow them in blocks since otherwise thefixed Deflate block code wouldn't be expressible dynamically.
A more detailed comparison of edge case handling across libraries can be found
below. An error is an error of some library-defined type that could be handled
by a library user; a basic error is something like a failed assertion or
(error "foo")
, where the problem is explicitly detected but can't be
reasonably handled by a user; an internal error is something like a type error
or an array bounds violation that is signalled by the implementation (but not
necessarily reliably so).
Deviations from zlib behavior are boldfaced for convenience; note that deviation is not necessarily a bad thing in itself. We consider internal errors deviation since they may cause behavior changes under some implementations.
edge case | zlib (zpipe) | semz | 3bz | Mai | chipz |
---|---|---|---|---|---|
/ | < | ||||
reserved block type | error | error | basic error | error | error |
uncompressed zero-length block | handled | handled | handled | handled | handled |
uncompressed block with wrong checksum | error | error | basic error | error | error |
uncompressed zero-length block with wrong checksum | error | error | basic error | error | outputs data |
reference goes beyond previous output | error | error | basic error | outputs data | outputs data |
uses illegal length code 286/287 | error | error | basic error | error | error |
uses illegal distance code 30/31 | error | error | basic error | longer range than 29 | error |
dynamic block provides encodings for 286/287 & 30/31 | error | outputs data | internal error | outputs data | outputs data |
dynamic block with one distance code (§3.2.7) | handled | handled | handled | handled | handled |
dynamic block with one distance code, uses the unassigned code | error | error | basic error | error | error |
dynamic block without distance codes (§3.2.7) | handled | handled | handled | handled | handled |
dynamic block without distance codes, tries to use len code | error | error | basic error | error | error |
Huffman tree with too many items | error | error | basic error | outputs data | outputs data |
Huffman tree with too few items | error | error | basic error | outputs data | outputs data |
Huffman tree with too few items, uses an unassigned code | n/a | n/a | n/a | error | error |
dynamic block has code lengths expand across len/dist boundary | handled | handled | handled | handled | handled |
dynamic block has code lengths expand out of bounds | error | error | basic error | internal error | error |
dynamic block starts by repeating the last code length | error | error | basic error | internal error | error |
reference stays in bounds but goes beyond zlib window size | outputs data | error | outputs data | outputs data | outputs data |
zlib preset dictionary required but none provided | error | error | basic error | error | outputs data |
wrong Adler-32 checksum | error | error | basic error | outputs data by default | error |
wrong gzip magic bytes | error | error | basic error | error | error |
gzip data with wrong CRC-32 checksum | error | error | basic error | outputs data by default | error |
gzip data with right CRC-32 checksum but wrong modular length | error | error | outputs data | outputs data | outputs data |
gzip data with header checksum | handled | handled | handled | handled | handled |
gzip data with incorrect header checksum | error | error | basic error | outputs data | outputs data |
gzip extra fields | handled | handled | basic error | handled | basic error |
inconsistent length fields in extra fields section | outputs data | error | n/a | outputs data | n/a |
2.1.2Bzip2
:PROPERTIES::CUSTOM_ID: edge-cases-bzip2:END:Bzip2 has no spec, so we mostly try to match the reference implementation. There is one notable exception: The reference implementation accepts underfull Huffman trees and only complains once unassigned codes are used; we reject such trees right off the bat since it simplifies the code.
I am not aware of any compressor that outputs such trees; Joe Tsai claims they exist, but provides no names. There isn't a good reason why a compressor would output such trees since bzip2 cannot have alphabets of size one, and whenever you have a branch with only one child, it's more efficient to just give the unique child a code that's one bit shorter. The resulting reshuffling of codes has negligible overhead (none if you only assign them at the end). I suppose this case is something to think about once someone runs into it while using this library.
Chipz is essentially a translation of the reference implementation, but doesn't bother to support the block randomization mechanism that has been deprecated for a good two decades. We do because our approach makes it very easy to add. You can find a test file here; it consists of 100MB of highest quality zero octets. The testfile in lbzip2 doesn't actually test block randomization since it is too short for any effects to show up.
Other than this, the format does not allow for many edge cases to begin with since most of its transformations simply cannot have invalid inputs. Just like the reference implementation, we do not care about wasteful encodings such as e.g. defining six Huffman trees despite only using three.
2.2Performance funny numbers
:PROPERTIES::CUSTOM_ID: performance:END:2.2.1Summary
:PROPERTIES::CUSTOM_ID: performance:END:We're usually the fastest with a significant improvement over chipz; under SBCL, 3bz is slightly faster for gzip data and noticeably faster for zlib data. Disabling overreads hurts a lot for Deflate but not much for bzip2.
2.2.2Details
:PROPERTIES::CUSTOM_ID: performance-details:END:The performance measurements below are taken on an older x86-64 machine, taking the best of five attempts, using each library's idiomatic method of file-to-file decompression:
- For semz.decompress, this means
with-open-file
+ Alexandria'scopy-stream
.We provide measurements with and without overreads enabled. - For chipz, this means
decompress
with pathname arguments. - For Mai's Deflate, this means
with-open-file
+ theinflate-*-stream
functions. We provide measurements with and without checksums to reflect afair comparison and the default respectively. - For 3bz, we use this code. Note that this involves manual buffering. We don'tprovide a measurement with the significantly slower stream input interfacesince it wouldn't be very informative.
I tried to keep the benchmark files varied since decompression benchmarks
(especially for Deflate, not so much for bzip2) are highly input-dependent, but
take the results with a grain of salt regardless. As a rough baseline,
measurements with zlib-1.2.13, GNU gzip-1.12, and bzip2-1.0.8 are provided,
respectively using the canonical zpipe
example program, gunzip
, and
bunzip2
.
file | input size | output size | zlib | gzip | bzip2 | notes |
---|---|---|---|---|---|---|
/ | < | > | < | > | ||
<r> | <r> | <r> | <r> | <r> | ||
openjdk‑17.0.6_p10.tar.gz | 105,221,267B | 672,163,840B | 2.618s | 4.122s | Big tarball with high compression ratio. | |
openh264‑2.3.1.tar.gz | 60,290,897B | 73,205,760B | 0.280s | 0.554s | Big tarball with low compression ratio. | |
gimp‑2.10.32.tar.bz2 | 31,397,425B | 207,185,920B | 6.687s | Big bzip2 tarball. | ||
ccl‑1.12.1‑linuxx86.tar.gz | 20,872,508B | 80,752,640B | 0.427s | 0.675s | A tarball with text and executable data. | |
sbcl‑2.3.3-source.tar.bz2 | 7,408,264B | 43,386,880B | 1.408s | It's SBCL. | ||
html_standard.html.gz | 1,872,784B | 12,803,512B | 0.053s | 0.081s | Gzipped HTML as you'd find online. | |
html_standard.html.bz2 | 1,335,667B | 12,803,512B | 0.373s | Bzip2 version of the above file. | ||
pepper.z | 2,132,583B | 51,268,000B | 0.148s | Pixel data of a ![]() |
||
pepper.bz2 | 1,616,871B | 51,268,000B | 0.512s | Bzip2 version of the above file. | ||
mario.z | 888,470B | 3,053,378B | 0.018s | Pixel data of a ![]() |
||
mario.bz2 | 692,856B | 3,053,378B | 0.110s | Bzip2 version of the above file. |
2.2.3SBCL-2.3.3
:PROPERTIES::CUSTOM_ID: performance-sbcl:END:SBCL is unsurprisingly the fastest of the tested implementations. We primarily optimize for it; notably we use a different implementation of the various CRCs.
file | 3bz | semz (O) | semz | chipz | Mai | Mai (C) |
---|---|---|---|---|---|---|
/ | < | |||||
<r> | <r> | <r> | <r> | <r> | <r> | |
openjdk-17.0.6_p10.tar.gz | 5.665s | 6.257s | 8.079s | 13.84s | 12.17s | 14.36s |
openh264-2.3.1.tar.gz | 0.878s | 0.910s | 1.349s | 1.538s | 3.032s | 3.271s |
gimp-2.10.32.tar.bz2 | 9.909s | 10.45s | 14.31s | |||
ccl-1.12.1-linuxx86.tar.gz | 0.957s | 1.081s | 1.441s | 2.166s | 2.308s | 2.567s |
sbcl-2.3.3-source.tar.bz2 | 2.010s | 2.154s | 3.519s | |||
html_standard.html.gz | 0.104s | 0.117s | 0.148s | 0.259s | 0.221s | 0.260s |
html_standard.html.bz2 | 0.576s | 0.599s | 0.764s | |||
pepper.z | 0.173s | 0.264s | 0.302s | 0.781s | 0.413s | 0.707s |
pepper.bz2 | 0.704s | 0.734s | 1.040s | |||
mario.z | 0.033s | 0.039s | 0.055s | 0.092s | 0.097s | 0.114s |
mario.bz2 | 0.134s | 0.147s | 0.259s |
2.2.4CCL-1.12.1
:PROPERTIES::CUSTOM_ID: performance-ccl:END:CCL has notoriously slow bit operations, resulting in an overall 6.5x runtime for Deflate decompression compared to SBCL and less pronounced differences between libraries. This is apparently being taken care of currently and I'd expect performance to change drastically once these fixes land. Bzip2 runtime is about 3x that of SBCL, which is pretty typical.
file | 3bz | semz (O) | semz | chipz | Mai | Mai (C) |
---|---|---|---|---|---|---|
/ | < | |||||
<r> | <r> | <r> | <r> | <r> | <r> | |
openjdk-17.0.6_p10.tar.gz | 54.03s | 36.34s | 38.86s | 64.15s | 56.73s | 62.89s |
openh264-2.3.1.tar.gz | 7.060s | 6.372s | 6.860s | 9.886s | 10.00s | 10.65s |
gimp-2.10.32.tar.bz2 | 31.99s | 33.04s | 71.84s | |||
ccl-1.12.1-linuxx86.tar.gz | 8.727s | 6.950s | 7.040s | 10.15s | 9.494s | 10.20s |
sbcl-2.3.3-source.tar.bz2 | 6.439s | 6.675s | 17.91s | |||
html_standard.html.gz | 1.035s | 0.712s | 0.726s | 1.216s | 1.054s | 1.164s |
html_standard.html.bz2 | 1.720s | 1.774s | 3.713s | |||
pepper.z | 1.922s | 1.300s | 1.329s | 2.317s | 2.599s | 5.348s |
pepper.bz2 | 2.306s | 2.368s | 4.656s | |||
mario.z | 0.298s | 0.279s | 0.286s | 0.366s | 0.378s | 0.547s |
mario.bz2 | 0.461s | 0.487s | 1.295s |
2.2.5ECL-21.2.1
:PROPERTIES::CUSTOM_ID: performance-ecl:END:Deflate decompression sits at some 40x-50x the runtime of SBCL, which is rough, but still usable for smaller data. Bzip2 ends up at a nicer 20x-30x runtime.
file | 3bz | semz (O) | semz | chipz | Mai | Mai (C) |
---|---|---|---|---|---|---|
/ | < | |||||
<r> | <r> | <r> | <r> | <r> | ||
ccl-1.12.1-linuxx86.tar.gz | 129.6s | 46.38s | 63.75s | 81.26s | 89.83s | 91.14s |
sbcl-2.3.3-source.tar.bz2 | 45.80s | 51.20s | 76.42s | |||
html_standard.html.gz | 18.23s | 4.477s | 5.986s | 8.852s | 8.606s | 8.826s |
html_standard.html.bz2 | 11.12s | 12.11s | 16.59s | |||
pepper.z | 41.67s | 7.495s | 9.221s | 13.81s | 15.23s | 21.56s |
pepper.bz2 | 15.32s | 16.53s | 24.24s | |||
mario.z | 3.906s | 1.987s | 2.703s | 3.036s | 3.847s | 4.223s |
mario.bz2 | 3.900s | 4.480s | 5.643s |
2.2.6ABCL-1.9.1
:PROPERTIES::CUSTOM_ID: performance-abcl:END:ABCL can't deal with the bit diddling involved in Deflate all that well and is
significantly slower than even ECL; you're likely better off just using
java.util.zip
instead. Our bzip2 performance is not that far off the ECL
measurements however. It should be noted that ABCL has a crazy overhead on
Lisp-side buffering, which is avoided by semz.decompress through ABCL-specific
declarations and by Mai's Deflate through not buffering at all. Note also that
3bz cannot rely on mmap here.
file | 3bz | semz (O) | semz | chipz | Mai | Mai (C) |
---|---|---|---|---|---|---|
/ | < | |||||
<r> | <r> | <r> | <r> | <r> | <r> | |
html_standard.html.gz | 53.89s | 24.01s | 26.76s | 55.19s | 36.54s | 36.19s |
html_standard.html.bz2 | 17.12s | 19.72s | 102.8s | |||
pepper.z | 82.32s | 81.38s | 85.53s | 93.42s | 97.79s | 98.54s |
pepper.bz2 | 47.75s | 51.13s | 120.7s | |||
mario.z | 24.04s | 6.347s | 8.035s | 25.71s | 12.34s | 12.36s |
mario.bz2 | 4.789s | 6.095s | 40.65s |
3Reference
:PROPERTIES::CUSTOM_ID: reference:END:In what follows, an octet vector is a one-dimensional array containing
octets, i.e. integers x with 0 ≤ x ≤ 255. Octet vectors that are passed to the
library need not be specialized or simple; octet vectors that are returned by
the library are always both. When we speak of bounding index designators, we
use the term with the same meaning as in Common Lisp, but additionally allow a
starting index of nil
to denote the beginning. Whenever a function takes
optional bounding index designators, the default is to denote the entire
sequence.
3.1Function: decompress
:PROPERTIES::CUSTOM_ID: fn-decompress:END:decompress format input &key start end output-size allow-overreads-p => decompressed-buffer, header
format
: A format specifier. See list-supported-formats.input
: Either abinary-input-stream
or an octet vector.start
,end
: Bounding index designators forinput
if the latter is avector. Ignored ifinput
is a stream.output-size
: A non-negative integer ornil
. The default isnil
.allow-overreads-p
: A generalized boolean. The default isnil
.decompressed-buffer
: An octet vector.header
: A property list with keyword keys. See decompression-stream-header.
The most convenient interface to this library. Returns a fresh octet vector that
contains the decompressed form of the format
-compressed data in input
.
output-size
, if not nil
, should be the predicted size of the decompressed
data. If correct, this prevents unnecessary copying at the end.
If allow-overreads-p
is true and input
is a stream, bytes beyond the end of
the compressed data might end up being read during decompression; enabling this
option tends to significantly improve performance on stream inputs.
In addition to the options listed above, decompress
can
take format-specific options.
3.2Condition: decompression-error
:PROPERTIES::CUSTOM_ID: c-decompression-error:END:Direct superclasses: simple-error
General superclass for errors related to decompression. While the error message gives detailed information about the exact error, we do not distinguish between e.g. failed checksums, out of bounds references or malformed Huffman trees on the type level since this information is basically useless to a programmatic user and usually just an artifact of corruption.
Gross user errors such as passing in invalid bounding index designators or
returning invalid dictionaries from a dictionary function do not signal a
decompression-error
.
3.3Condition: eof
:PROPERTIES::CUSTOM_ID: c-eof:END:Direct superclasses: decompression-error
Signalled when the input stream/buffer is exhausted. Notably implies that the
data up until this point was not detectably malformed. Note that even when the
input is a stream, it is this condition which is signalled, not end-of-file
.
3.4Condition: unrecognized-zlib-dictionary
:PROPERTIES::CUSTOM_ID: c-unrecognized-zlib-dictionary:END:Direct superclasses: decompression-error
Signalled when zlib decompression involves a preset dictionary whose checksum isn't recognized by the dictionary function.
Note: This condition is only signalled when preset dictionaries are enabled. See format-specific options.
3.5Function: unrecognized-zlib-dictionary-checksum
:PROPERTIES::CUSTOM_ID: fn-unrecognized-zlib-dictionary-checksum:END:unrecognized-zlib-dictionary-checksum uzd => checksum
uzd
: An unrecognized-zlib-dictionary condition.checksum
: An(unsigned-byte 32)
.
Returns the unrecognized checksum that was encountered during zlib decompression, in the form adler-32 outputs. See format-specific options.
3.6Function: list-supported-formats
:PROPERTIES::CUSTOM_ID: fn-list-supported-formats:END:list-supported-formats => list
list
: A list of symbols.
Returns a list of symbols that are valid format specifiers, i.e. can be used
as format
arguments to make-decompression-stream and decompress to specify a
compression format.
3.7Class: decompression-stream
:PROPERTIES::CUSTOM_ID: c-decompression-stream:END:Direct superclasses: fundamental-binary-input-stream
Gray stream wrapper that can be used for complicated stream processing. The end of file is considered reached when the underlying compressed data ends and all associated uncompressed data has been read.
3.8Function: make-decompression-stream
:PROPERTIES::CUSTOM_ID: fn-make-decompression-stream:END:make-decompression-stream format input &key start end allow-overreads-p => decompression-stream
format
: A format specifier. See list-supported-formats.input
: Either abinary-input-stream
or an octet vector.start
,end
: Bounding index designators forinput
if the latter is avector. Ignored ifinput
is a stream.allow-overreads-p
: A generalized boolean. The default isnil
.decompression-stream
: A decompression-stream.
The most general interface to this library. Returns a decompression-stream
whence the decompressed form of the format
-compressed data in input
can be
read. input
should not be modified or read from until decompression-stream
has reached end of file. If input
is a stream, then the caller is responsible
for closing it.
Calling this function will read header information from input
, but no
decompression will be performed until data is read from decompression-stream
.
If allow-overreads-p
is true and input
is a stream, bytes beyond the end of
the compressed data might end up being read during stream creation and
decompression; enabling this option tends to significantly improve performance
on stream inputs.
In addition to the options listed above, make-decompression-stream
can
take format-specific options.
3.9Function: decompression-stream-format
:PROPERTIES::CUSTOM_ID: fn-decompression-stream-format:END:decompression-stream-format stream => format
stream
: A decompression-stream.format
: A format specifier.
Returns the format
argument used to create stream
. format
is always an
element of the list returned by list-supported-formats.
3.10Function: decompression-stream-header
:PROPERTIES::CUSTOM_ID: fn-decompression-stream-header:END:decompression-stream-header stream => header
stream
: A decompression-stream.header
: A property list with keyword keys.
Returns the header metadata of the data associated to stream
. Depending on the
format, the possible entries are as below.
3.10.1Deflate
:PROPERTIES::CUSTOM_ID: header-deflate:END:Deflate data doesn't have a header.
3.10.2Zlib
:PROPERTIES::CUSTOM_ID: header-zlib:END:window-size
: The declared window size, in bytes.level
: The "decompression level", a rough measure of whether speed or sizewere prioritized during compression. Goes from 0 to 3, with 0 meaning fastestand 3 meaning smallest.dictionary
: The Adler-32 checksum (as returned by adler-32) of the presetdictionary, if applicable, ornil
.
3.10.3Gzip
:PROPERTIES::CUSTOM_ID: header-gzip:END:We do not bother converting IDs and enumerations into symbols because this doesn't meaningfully reduce complexity - the uses for the affected fields are so niche that someone who wants to use them is probably more familiar with the raw data than anything we come up with.
textp
: Boolean. If true, the data is supposed to be text and might requireline ending conversion.extra-fields
: Alist with two-character-string keys and octet vector values.See gzip spec. Note that keys may be repeated; they have the same order theyhad in the file.filename
: String ornil
. Filename of the original file, if provided.comment
: String ornil
. Comment field, if provided.modification-time
: Integer. Modification time of the original file or timeof compression, given in Unix time.extra-flags
: Integer. See gzip spec.operating-system
: Integer. See gzip spec.
3.10.4Bzip2
:PROPERTIES::CUSTOM_ID: header-bzip2:END:block-size
: Integer. The BWT block size in bytes.
3.11Format-specific options
:PROPERTIES::CUSTOM_ID: format-specific-options:END:The following keyword arguments can be passed to make-decompression-stream and decompress for specific formats. They are ignored for all other formats.
window-size
(deflate): A non-negative integer. Determines how far backreferences can reach; references that go back further result ina decompression-error. The default is 2^{15}, which results in no restrictionswhatsoever.prefix
,prefix-start
,prefix-end
(deflate): Eithernil
or an octetvector with optional bounding index designators. If notnil
, makes theoctets inprefix
available to back references as if they were previousdecompressor output. The default isnil
.dictionary
(zlib): A function ornil
. Implements preset dictionaries. Ifnil
(the default), preset dictionaries are disabled and signal a genericdecompression-error when encountered. The easiest way to use this option isvia make-simple-zlib-dictionary.In general, a function must take a single
(unsigned-byte 32)
, representing an Adler-32 checksum as returned by adler-32, and return three valuesprefix
,prefix-start
,prefix-end
that are used like the options of the same name for Deflate. Ifprefix
isnil
, an unrecognized-zlib-dictionary condition is signalled.
3.12Function: make-simple-zlib-dictionary
:PROPERTIES::CUSTOM_ID: fn-make-simple-zlib-dictionary:END:make-simple-zlib-dictionary buffers => dictionary
buffers
: A sequence of octet vectors.dictionary
: A function.
Returns a function suitable to be passed as a dictionary
argument for zlib
decompression which recognizes every octet vector in the sequence buffers
and
no others. The vectors should not be modified afterwards. An error is signalled
if multiple vectors with distinct contents have the same checksum.
See format-specific options for details and how to make more complicated
dictionaries.
3.13Function: adler-32
:PROPERTIES::CUSTOM_ID: fn-adler-32:END:adler-32 data &key start end => checksum
data
: An octet vector.start
,end
: Bounding index designators fordata
.checksum
: An(unsigned-byte 32)
.
Returns the Adler-32 checksum of the given data, in canonical
s2~ × 2^{16} + ~s1
form. This function is provided to help with setting up
more complicated zlib dictionaries; see format-specific options.