bitio
2022-02-20
A wrapper for octet streams that enable bit level streams.
1BITIO: A Common Lisp library for processing octet streams as bit streams.
This library can currently only provide READING stream of bits. Later, it will support WRITING a stream of bits. Under the conditions of reading/writing bits that are multiples of 8 when also being octet-aligned in the underlying stream, bitio will have good performance. It is recommended to use FAST-IO with BITIO, but it is not required.
Reading bits as a bit stream is a funny business and requires a bit of demonstration (no pun intended) of the edge cases and structural format of how the bits are laid out on disk, in memory, and in multi-part integers.
Before getting to the nitty gritty about the API and how it works, here is a quick example with no explanation about how it works, only what it does.
2A fast example:
;; Suppose we want to read a binary octet stream where we must read in order:
;; 32 bits as a signed integer in little endian format, then
;; 16 bits as a signed integer in big endian format, then
;; 3 bits as an unsigned integer, then
;; 8 bits as an unsigned integer, then
;; 5 bits as an unsigned integer, then
;; read a number of octets denoted by the 8-bit number,
;; for a total of 8 + N bytes read.
;; Usually, to implement the 3/8/5 bit reads, you'd read 2 bytes, then do a pile
;; of bit masking/shifting/anding/etc in order to retrieve the values you
;; desire. You would also implicitly assume some bit orderings while doing the
;; masking. Here is a possible (of several, actually) rendering of the above in
;; bitio:
(with-open-file (fin (asdf:system-relative-pathname :bitio "binfile")
:element-type '(unsigned-byte 8))
;; Wrap fin stream with a bitio stream.
(let ((bitio (make-bitio fin #'read-byte #'read-sequence)))
(let ((32-bits (bitio:read-integer bitio :unsignedp nil))
(16-bits (bitio:read-integer bitio :num-bytes 2 :byte-endian :be
:unsignedp nil))
(3-bits (bitio:read-integer bitio :num-bytes 1 :bits-per-byte 3))
(8-bits (bitio:read-integer bitio :num-bytes 1))
(5-bits (bitio:read-integer bitio :num-bytes 1 :bits-per-byte 5))
(seq (make-array 8-bit :element-type '(unsigned-byte 8)))
(num-read (bitio:read-bytes bitio seq :be 8)))
(format t "Do something with the read data values....~%"))))
3Bit Stream Representation
3.1Canonical Form of a Set of Bits
BITIO uses unsigned integers to represent a set of ordered raw bits in CLmemory.The following representation is both the "written on paper" representation and also the in-memory representation of the bits in CL memory. For N bits in an ordered set of bits, we have:
MSBit | LSBit | ||||
---|---|---|---|---|---|
2^{N-1} | 2^{N-2} | 2^{N-3} | ... | 2^{1} | 2^{0} |
0 | 1 | 0 | ... | 1 | 1 |
Suppose we have an 8-bit ordered bit set represented by the unsigned integer
right most 1.In BITIO, the unsigned integer representing the raw bits and the count of those bits are often found together in the API. This allows it to know that we are dealing with a bit set of N bits, even if all of those bits are 0.
The ordering aspect arises in that we can only append/take bits from the MSBit or the LSBit side of the bit set in canonical form. We cannot take any bits from the middle.
It is usually the case that when inserting bits into the Canonical Form, the bits are inserted on the LSBit side of the result and shifted left 1 bit for every bit that needs to be inserted into the result.
3.2Octet Stream Source for Bits: Function MAKE-BITIO
BITIO is a stream that explicity wraps another stream. Let's call this otherstream S. S must have the type (UNSIGNED-BYTE 8) and hence will supply 8-bitunsigned integers.Here is an abstract representation of S where the Octet + 0 column is the octet about to be read from the octet stream, then, Octet + 1 is the next octet read, and so on, and Remaining Octets is all of the rest of the ordered octets in stream S, as expected.
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
... |
BITIO is agnostic about the streams it wraps. Upon construction of a BITIO instance you must supply the stream being wrapped and the functions for reading and writing that are associated with the wrapped stream. These functions must have the same ordinary lambda lists as CL's READ-BYTE and READ-SEQUENCE, respectively.
A more useful manner to think about the octet stream is that each (UNSIGNED-BYTE 8) octet is automatically in a canonical form for an 8-bit unsigned integer. It is infact the form that all modern OS's and hardware will read an octet stream from memory, network, or storage. The MSBit is the leftmost bit, and the LSBit is the rightmost bit.
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
... |
From this convenient representation, we now describe how to read individual bits from it.
To wrap a previously created octet stream, one calls this BITIO function:
;; Wrap an octet stream with a BITIO stream.
(bitio:make-bitio octet-stream bitio/read-octet bitio/read-sequence &rest initargs)
which returns a BITIO instance from which 1 or more individual bits may be read.
Here is an example using regular CLHS streams:
(with-open-file (fin (asdf:system-relative-pathname :bitio "binfile")
:element-type '(unsigned-byte 8))
;; Wrap fin stream with a bitio stream. Pass appropriate function to
;; read unsigned 8-bit integers from the stream.
(let ((bitio (make-bitio fin #'read-byte #'read-sequence)))
(format t "Read some bits here...~%")))
Here is an example of wrapping a FAST-IO stream for a file:
(defun wrap-fast-read-sequence (vec buf &key (start 0) (end nil))
"Must wrap this so it has the same signature as clhs' READ-SEQUENCE."
(fast-io:fast-read-sequence vec buf start end))
(with-open-file (fin (asdf:system-relative-pathname :bitio "binfile")
:element-type '(unsigned-byte 8))
(fast-io:with-fast-input (fin-fast nil fin)
;; Wrap the fin stream with a bitio stream. Notice we pass the appropriate
;; unsigned 8-bit reader function for this stream type.
(let ((bitio (make-bitio fin-fast
#'fast-io:fast-read-byte
#'wrap-fast-read-sequence)))
(format t "Read some bits here...~%"))))
And last, but not least, here we wrap FAST-IO to read bytes from a buffer:
(defun wrap-fast-read-sequence (vec buf &key (start 0) (end nil))
"Must wrap this so it has the same signature as clhs' READ-SEQUENCE."
(fast-io:fast-read-sequence vec buf start end))
(fast-io:with-fast-input (fiobuf (vector #xbb #x11 #x0d #x44))
(let ((bitio (make-bitio fiobuf
#'fast-io:fast-read-byte
#'wrap-fast-read-sequence)))
(format t "Read some bits here...~%")))
3.3Reading from BITIO
Before talking about the various ways we can read bits from the wrapped octetstream, we must label them so we can accurately talk about each bit.Here is an example of labeling the canonical form bits in the octet stream. I've removed the CL #b prefix, so the individual bits align with their identifier.
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
00110011 |
10101011 |
01001101 |
... |
abcdefgh |
ijklmnop |
qrstuvwx |
... |
To be explicit, this is the mapping of IDs to bits in the octet stream:
Bit ID | Bit Value |
---|---|
a |
0 |
b |
0 |
c |
1 |
d |
1 |
e |
0 |
f |
0 |
g |
1 |
h |
1 |
i |
1 |
j |
0 |
k |
1 |
l |
0 |
m |
1 |
n |
0 |
o |
1 |
p |
1 |
q |
0 |
r |
1 |
s |
0 |
t |
0 |
u |
1 |
v |
1 |
w |
0 |
x |
1 |
3.3.1Bit Reading: Function READ-BITS
When reading individual bits from the BITIO stream, we must specify the numberof bits we intend to read and from which side of the canonical form of theoctets from which they are read. The bits are returned in a canonical form withthe first bit read being in the MSBit of the result and the last bit read beingin the LSBit of the result.The function to read bits from a BITIO stream is:
;; Read N bits from the BITIO stream.
(bitio:read-bits bitio bit-read-count bit-endian
&optional (eof-error-p t) (eof-value nil))
NOTE: This function doesn't understand properties of integers. It only reads raw bits from the underlying octet stream. Under certain conditions, it is meaningful to interpret the numbers this function returns as unsigned integers, but in general, if you want to read integers explicitly, there is a function for that described later.
The arguments are:
Argument | Meaning |
---|---|
BITIO | A BITIO instance |
BIT-READ-COUNT | Number of bits to read |
BIT-ENDIAN | :BE for big-endian |
:LE for litte-endian | |
Indicates from which end to take bits | |
EOF-ERROR-P | Same an in READ |
EOF-VALUE | Same as in READ |
The return is a values of the bits in canoncal form and the number of bits read. In the case of a short/EOF read and you're using EOF-ERROR-P with a NIL value, you may get less than the number of bytes you expected to read.
Now let's be more clear about what the BIT-ENDIAN argument actually means when reading the bits.
3.3.1.1Bit Big Endian Reads
Suppose we have wrapped this octet stream:Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
00110011 |
10101011 |
01001101 |
... |
abcdefgh |
ijklmnop |
qrstuvwx |
... |
Then, we call this function:
(bitio:read-bits bitio 5 :be)
Then, for EACH bit of the 5 bits, we strip one bit from the MSBit side of the Octet + 0 octet, and shift them into canonical form.
We first read the 'a' bit, then the 'b' bit, then the 'c' bit, and so on with 'd', and 'e'. Each bit goes into the 2^{0} position of the result with the previous bits shifted to the left. Leaving the MSBit of the result (which is in canonical form) being bit 'a' and the LSBit of the result being bit 'e'.
The return values of the above function will be:
#b00110
;;abcde
5
Now, the BITIO stream will look like this:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
-----011 |
10101011 |
01001101 |
... |
-----fgh |
ijklmnop |
qrstuvwx |
... |
NOTE: The - character represents bits that have been stripped off of the bit stream, and are now unavailable for further reading.
Suppose we continue reading 3 more bits with :be setting:
(bitio:read-bits bitio 3 :be)
We'll read 'f' first, then 'g', then 'h'. 'f' goes into the 2^{0} part of the result, then the next bit causes a shift left of the result, and so in, until we return:
#b011
;;fgh
3
At this point, the BITIO stream will look like this:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
-------- |
10101011 |
01001101 |
... |
-------- |
ijklmnop |
qrstuvwx |
... |
which simplifies to:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
10101011 |
01001101 |
......... | ... |
ijklmnop |
qrstuvwx |
......... | ... |
3.3.1.2Bit Little Endian Reads
In little endian reads, we take individual bits from the LSBit side of the octetand corral them into Canonical Form. This can result in some non-intuitive bitsets.Let's start with the original BITIO stream:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
00110011 |
10101011 |
01001101 |
... |
abcdefgh |
ijklmnop |
qrstuvwx |
... |
Then, we call this function:
(bitio:read-bits bitio 5 :le)
Here, we read the individual bits from the LSBit side of Octet + 0 and store them into Canonical Form.
So, we read bits 'h', 'g', 'f', 'e', and 'd' and store them into canonical form like:
MSBit | LSBit | |||
---|---|---|---|---|
2^{4} | 2^{3} | 2^{2} | 2^{1} | 2^{0} |
h | g | f | e | d |
1 | 1 | 0 | 0 | 1 |
The final returned values are:
#b11001
;;hgfed
5
Then, the BITIO stream is in this state:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
001----- |
10101011 |
01001101 |
... |
abc----- |
ijklmnop |
qrstuvwx |
... |
Notice carefully, that bits 'a', 'b', and 'c' are available to be read from Octet + 0.
Suppose we read those bits, and a few more with this call:
(bitio:read-bits bitio 7 :le)
We will read the bits in this order: 'c', 'b', 'a', 'p', 'o', 'n', 'm' and put them into the Canonical form of cbaponm.
Then, we return these values:
#b1001101
;;cbaponm
7
which leaves the stream in this state:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
-------- |
1010---- |
01001101 |
... |
-------- |
ijkl---- |
qrstuvwx |
... |
which simplifies to:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
1010---- |
01001101 |
........ | ... |
ijkl---- |
qrstuvwx |
........ | ... |
3.3.1.3Mixed Bit Endian Reads
It is fully possible to intermix big bit-endianness and little bit-endiannessreads. Let's do an example to see how this works.First start with the BITIO stream:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
00110011 |
10101011 |
01001101 |
... |
abcdefgh |
ijklmnop |
qrstuvwx |
... |
Then, we call this function:
(bitio:read-bits bitio 3 :le)
And get back these results:
#b110
;;hgf
3
leaving the BITIO stream in this configuration:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
00110--- |
10101011 |
01001101 |
... |
abcde--- |
ijklmnop |
qrstuvwx |
... |
Then, we switch bit endianness and read 3 bits. These three bits are read from the MSBit side of the Octet + 0 value, so, starting at bit 'a'.
(bitio:read-bits bitio 3 :be)
which returns these values:
#b001
;;abc
3
and leaves the BITIO stream in this state:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
---10--- |
10101011 |
01001101 |
... |
---de--- |
ijklmnop |
qrstuvwx |
... |
Notice how the 'd' and 'e' bits are left to be read!
Let's read them in an :le manner and some additional bits too and see what happens:
(bitio:read-bits bitio 6 :le)
We read bits in this order: 'e' 'd' 'p' 'o' 'n' 'm'
And these are the values we get back:
#b011101
;;edponm
6
And now the BITIO stream is in this state:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
-------- |
1010---- |
01001101 |
... |
-------- |
ijkl---- |
qrstuvwx |
... |
which simplifies to:
Octet + 0 | Octet + 1 | Octet + 2 | Remaining Octets |
---|---|---|---|
1010---- |
01001101 |
........ | ... |
ijkl---- |
qrstuvwx |
........ | ... |
One can easily achieve some pretty complex arbitrary bit reads from the underlying octet stream with the function read-bits.
3.3.2Integer Reading: Function READ-INTEGER
Integers (both signed and unsigned) are interpretations of raw bits stored in acanonical form with certain constraints and in a certain structure.The constraint in question are the parameters under which sign extension happens. Sign extension in languages like C are easy to perform since there are assembly level operations to perform this for each hardware-sized quantity in which the bit pattern is stored. In Common Lisp, with arbitrarily-sized integers, there is more work to accomodate sign extension rules. To explain deeper for CL, an unsigned value is considered to have an infinite number of zero bits prefixing it. A signed value that is negative happens to have an infinite number of 1 bits prefixing it. There is a little bit of math to enable this idea of infinite prefixes of zeros or ones that we must accomplish in CL.
The structure in question is the ordering of the multiple bytes that constitute a multi-part integer (here, defined here as M N-bit unsigned chunks). This is the usual understanding of Byte Endianess with respect to multi-byte integers.
For the purposes of BITIO, the bytes that constitute a single integer can be N-bits long, but they must ALL be N-bits long. The bit level endianness of those bytes themselves can also be little or big, but that setting must be true for ALL bytes read on behalf of an integer. Then, all of the bytes can be treated as little or big endian in terms of how they are placed into the final integer form.
NOTE: To be explicit, the term byte is defined to be an unsigned N-bit quantity, as opposed to its traditional definition of an unsigned 8-bit quantity.
It is also the case that you can intermix calls that change the endianness of either the bytes being read, or the endianness of the byte ordering--along with not being octet-aligned, and this function will do the right thing.
So, without further ado, we introduce a new function called:
(bitio:read-integer bitio
&key
(bit-endian :be)
(byte-endian :le)
(num-bytes 4)
(bits-per-byte 8)
(unsignedp T))
The arguments are:
Argument | Meaning |
---|---|
BITIO | A BITIO instance |
BIT-ENDIAN | :BE for big-endian |
:LE for litte-endian | |
Indicates bit endianness of all read bytes | |
BYTE-ENDIAN | :BE for big-endian |
:LE for little-endian | |
Indicates byte level ordering in the integer | |
NUM-BYTES | How many bytes will be read for this integer |
BITS-PER-BYTE | The bit-width of each byte |
UNSIGNEDP | Should we treat the integer as unsigned |
This function is the meat and potatoes for reading integers out of a BITIO stream. The read integers need not be octet-aligned, and the bytes constituting them need not be 8-bits wide.
Let's do an example of this call:
First, we show a BITIO stream (with hexadecimal view added):
Octet + 0 | Octet + 1 | Octet + 2 | Octet + 3 | Remaining Octets |
---|---|---|---|---|
#x33 |
#xAB |
#x4D |
#xF0 |
........ |
00110011 |
10101011 |
01001101 |
11110000 |
........ |
abcdefgh |
ijklmnop |
qrstuvwx |
yzABCDEF |
........ |
Then, we perform this call:
(bitio:read-integer bitio :unsignedp NIL)
Here is what happens. This description is semantically what happens, but not necessarily algorithmically what happens.
- Since BITS-PER-BYTE is 8, each byte is going to be 8 bits long.
- Since BIT-ENDIAN is :be, we read the bits in a left to right order for _eachbyte_.
- Since NUM-BYTES is 4, we read 4 8-bit unsigned values.
- Since BYTE-ENDIAN is :le, we'll pack the bytes into the integer in littleendian order.
- Since UNSIGNEDP is NIL, we will treat the value as signed.
So, the first thing we do is read the 4 bytes in this order:
and then pack them into the integer such that the leftmost byte is the least significant byte in the integer:
And then, since UNSIGNEDP is NIL, we convert this unsigned value, using the knowledge of the total number of bits we need to represent this number (* num-bytes bits-per-byte), and that the MSBit in this integer is 1 which makes it unsigned.
To get this decimal value: -263345357
We can check our work like this:
CL-USER> (ldb (byte 32 0) -263345357) 4031621939 (32 bits, #xF04DAB33)
which allows us to recover the original unsigned integer (before processing as a signed value) after reading it from the octet stream.
3.3.3Extended example with read-integer
Here we write out the code that will read a BITIO wrapped stream that is reading a FLAC binary file. The parser is expected to read a FRAME_HEADER which is defined here. A number in the left column like <14> means to read 14 bits from the binary stream. <4> means read 4 bits, <1> mean read one bit, and so on.
https://xiph.org/flac/format.html#frame_header
NOTE: All numbers are in big-endian format in FLAC unless otherwise noted. Even though bitio:read-integer defaults to :le for byte-endian, it only matters if reading more than 1 byte to create the integer. So, in those places we manually specify the ordering.
NOTE: In a future revision of BITIO, I may revisit this issue and store things like default settings in the BITIO instance itself.
The point of this example is to see how we avoid the perilous bit-masking that this code would normally have to do to read the above binary format. Where we do extract bits, it is following the specification in a logical and meaningful manner.
(defun parse-frame-header (bitio)
"Parse a FRAME_HEADER and return a structure with that information in it."
(let* ((sync-code (bitio:read-bits bitio 14 :be))
(reserved-0 (bitio:read-bits bitio 1 :be))
(blocking-strategy (bitio:read-bits bitio 1 :be))
(inter-channel-block-size
(bitio:read-integer bitio :num-bytes 1 :bits-per-byte 4))
(sample-rate (bitio:read-integer bitio :num-bytes 1 :bits-per-byte 4))
(channel-assignment
(bitio:read-integer bitio :num-bytes 1 :bits-per-byte 4))
(sample-size-in-bits
(bitio:read-integer bitio :num-bytes 1 :bits-per-byte 3))
(reserved-1 (bitio:read-bits bitio 1 :be))
(coded-frame-or-sample
(if (eql blocking-strategy 1)
;; variable blocksize
(parse-utf8 bitio 36)
;; fixed blocksize
(parse-utf8 bitio 31)))
(blocksize-value
(when (eql #b011 (ldb (byte 3 1) inter-channel-block-size))
(if (zerop (ldb (byte 1 0) inter-channel-block-size))
(bitio:read-integer :num-bytes 1)
(bitio:read-integer :num-bytes 2 :byte-endian :be))))
(sample-rate-value
(let ((trigger (ldb (byte 2 2) sample-rate))
(kind (ldb (byte 2 0) sample-rate)))
(when (eql #b11 trigger)
(cond
((eql kind #b00)
(bitio:read-integer bitio :num-bytes 1))
((eql kind #b01)
(bitio:read-integer bitio :num-bytes 2 :byte-endian :be))
((eql kind #b10)
(* 10 (bitio:read-integer bitio :num-bytes 2
:byte-endian :be)))
((eql kind #b11)
(error
"invalid parse of frame-header: mimic sync code"))))))
(crc-8 (bitio:read-integer bitio :num-bytes 1)))
(unless (eql sync-code #b11111111111110)
(error "invalid parse of frame-header: bad sync code"))
;; Then pack it up and send it off.
(make-frame-header blocking-strategy
inter-channel-block-size
sample-rate
channel-assignment
sample-size-in-bits
coded-frame-or-sample
blocksize-value
sample-rate-value
crc-8)))
(defun parse-utf8 (bitio num-bits)
(let ((bit-set (bitio:read-bits bitio num-bits :be)))
;; Next function defined elsewhere.
(convert-to-utf8 bit-set)))
The above represents a pretty complex use of BITIO to save a lot of work. It allows a natural parsing (and ease of debugging) of the format of the FRAME_HEADER in the FLAC binary format.
3.4Writing to BITIO
Writing bits to the stream is not implemented at this time. It will beimplemented in a future revision of BITIO.3.4.1Bit Writing
3.4.2Integer Writing
3.5API Summary
3.5.1Type BITIO
TBD3.5.2Function MAKE-BITIO
TBD3.5.3Function READ-BITS
TBD3.5.4Function READ-ONE-BYTE
TBD3.5.5Function READ-BYTES
TBD3.5.6Function READ-INTEGER
TBD3.5.7Function OCTET-READ-BOUNDARY-P
TBD3.6Known Bugs & Omissions
- There is no equivalent for WITH-OPEN-FILE for BITIO yet.
- You cannot CLOSE a BITIO yet.
- You cannot write a bit stream yet.