CONSPACK implementation for Common Lisp
- Decoder now handles more uses of circular reference.
- Properties now require
WITH-PROPERTIESif used outside an
DECODE, see below.
CONSPACK was inspired by MessagePack, and by the general lack of features among prominent serial/wire formats:
JSON isn't terrible, but can become rather large, and is potentially susceptible to READ exploits (as per the recently-fixed "10e99999999" bug in SBCL).
BSON (binary JSON) doesn't really solve much; though it encodes numbers, it's not particularly smaller or more featureful than JSON.
MessagePack is small, but lacks significant features; it can essentially encode arrays or maps of numbers, and any interpretation beyond that is up to the receiver.
Protobufs and Thrift are static.
It should be noted that, significantly, none of these support references. Of course, references can be implemented at a higher layer (e.g., JSPON), but this requires implementing an entire additional layer of abstraction and escaping, including rewalking the parsed object hierarchy and looking for specific signatures, which can be error-prone and hurts performance.
Additionally, none of these appear to have much in the way of security, and communicating with an untrusted peer is probably not recommended.
CONSPACK, on the other hand, attempts to be a more robust solution:
Richer set of data types, differentiating between arrays, lists, maps, typed-maps (for encoding classes/structures etc), numbers, strings, symbols, and a few more.
Very compact representation that can be smaller than MessagePack.
In-stream references, including optional forward references, which can allow for shared or circular data structures. Additionally, remote references allow the receiver the flexibility to parse and return its own objects without further passes on the output.
Security, including byte-counting for (estimated) maximum output size, and the elimination of circular data structures.
Speed. Using fast-io encoding and decoding can be many times faster than alternatives, even while tracking references (faster still without!).
See SPEC for complete details on encoding.
cl-conspack is simple to use:
(encode '(1 2 3)) ;; => #(40 4 16 1 16 2 16 3 0) (decode (encode '(1 2 3))) ;; => (1 2 3) ;; Smaller if the element-type is known: (encode (fast-io:octets-from '(1 2 3))) ;; => #(36 3 20 1 2 3)
CLOS and general objects
Conspack provides the ability to serialize and deserialize objects of any kind.
The easiest way, for the common case:
(conspack:defencoding my-class slot-1 slot-2 slot-3)
This expands to the more flexible way, which specializes
(defmethod conspack:encode-object append ((object my-class) &key &allow-other-keys) (conspack:slots-to-alist (object) slot-1 slot-2 slot-3 ...)) (defmethod conspack:decode-object-initialize progn ((object my-class) class alist &key &allow-other-keys) (declare (ignore class)) (alist-to-slots (alist object) slot-1 slot-2 slot-3))
ENCODE-OBJECT should specialize on the object and return an alist.
The alist returned will be checked for circularity if
is in use.
DECODE-OBJECT-ALLOCATE should specialize on
(eql 'class-name), and
produce an object based on the class and alist.
DECODE-OBJECT-INITIALIZE should specialize on the object (which has
been produced by
DECODE-OBJECT-ALLOCATE), and initializes it.
This two step process is necessary to handle circularity correctly.
As you can see, this does not require objects be in any particular format, or that you store any particular slots or values. It does not specify how you restore an object.
But for the "normal" case,
provided to build and restore from alists, and
define all of this in one simple form.
Circularity and References
Circularity tracking is not on by default, you can enable it for a
particular block of
decodes by using
(tracking-refs () (decode (encode CIRCULAR-OBJECT)))
"Remote" references are application-level references. You may encode a reference using an arbitrary object as a descriptor:
(encode (r-ref '((:url . "http://..."))))
When decoding, you may provide a function to handle these:
(with-remote-refs (lambda (x) (decode-url x)) (decode OBJECT))
If you have a relatively small static set of symbols you will always use for a particular encoding/decoding, you may want to use indexes. These allow symbols to be very-tightly-packed: for up to 15 symbols, a single byte can encode the symbol! For up to 256, two bytes, and so on.
(cpk:with-index (specifier-1 specifier-2 specifier-3) (cpk:encode '(specifier-1 specifier-2 specifier-3))) ;; => #(40 4 176 177 178 0) ;; Contrast this with: (cpk:encode '(specifier-1 specifier-2 specifier-3)) ;; #(40 4 130 64 11 83 80 69 67 73 70 73 69 82 45 49 129 64 16 67 79 ;; 77 77 79 78 45 76 73 83 80 45 85 83 69 82 130 64 11 83 80 69 67 73 ;; 70 73 69 82 45 50 129 64 16 67 79 77 77 79 78 45 76 73 83 80 45 85 ;; 83 69 82 130 64 11 83 80 69 67 73 70 73 69 82 45 51 129 64 16 67 79 ;; 77 77 79 78 45 76 73 83 80 45 85 83 69 82 0)
(This is a somewhat excessive example, since long non-keyword symbols are used. Shorter keyword symbols would be relatively shorter, but this is the general case.)
For more "realistic" use, you may define an index and refer to it:
(define-index index-name symbol-1 symbol-2 ...) (with-named-index 'index-name (encode ...))
For instance, you may define multiple indexes for multiple different format versions, read the version, and use the appropriate index:
(define-index version-1 ...) (define-index version-2 ...) (let ((version (decode-stream s))) (with-named-index version ;; Decode the rest of the stream appropriately. You may want to ;; do more checking on VERSION if security is required... (decode-stream s)))
Note that using
tracking-refs will also help encode symbols
efficiently, but not quite as efficiently:
- The full string for the symbol (and if necessary, package), will be encoded at least once, when first encountered
- Refs are tracked in-order, and may lead to longer tags than a comparable index would use
tracking-refs is a perfectly suitable option, especially if
flexibility is desired, since all symbol information is encoded, and
nothing special is needed for decoding.
(Properties now require a
WITH-PROPERTIES block in some
circumstances, see below.)
Properties are a way to specify additional information about an object
that may be useful at decode-time. For instance, while hash tables
are supported as maps, there are no bits to specify the
parameter, so decoding a hash table of strings would produce a useless
object. In this case, the
:test property is set when encoding and
checked when decoding hash tables.
You may specify arbitrary properties for arbitrary objects; the only
restriction is the objects must test by
(conspack:with-properties () (let ((object (make-instance ...))) (setf (property object :foo) 'bar) (property object :foo))) ;; => BAR
This sets the
:foo property to the symbol
bar, and it is encoded
along with the object. Note this will increase the object size, by
the amount required to store a map of symbols-to-values.
When decoding, you can access properties about an object via
(defmethod decode-object-initialize (...) (let ((prop (getf *current-properties* NAME))) ...))
You may remove them with
Properties are now only available within a
block. This has a number of benefits, including some thread safety,
and ensuring properties don't stick around forever.
DECODE have implicit
WITH-PROPERTIES blocks: you
don't need to specify
WITH-PROPERTIES if you use properties inside
DECODE-OBJECT, or encode and decode any objects
that have implicit properties. You only need this if you wish to
access properties outside of the encode or decode (e.g.,
preassigning properties to be encoded).
Allocation Limits and Security
Conspack provides some level of "security" by approximately limiting the amount of bytes allocated when reading objects.
By default, because format sizes are prespecified statically, it's possible to specify extremely large allocations for e.g. arrays with only a few bytes. Obviously, this is not suitable for untrusted conspack data.
The solution is simply to cap allocations:
(with-conspack-security (:max-bytes 200000) (decode ...))
Since actual allocation sizes are rather difficult to get in most lisps, this approximates the allocation based on how big each object might be, e.g.:
pointer-size * array-size
Each object header is tallied against the limit just prior to its decoding; if the object would exceed the allowed bytes, decoding halts with an error.
Further options may be added in the future.
By default, Conspack does not intern symbols, on the assumption that the producer and consumer have agreed on what symbols to use beforehand. If the decoder finds a symbol that has not already been interned, it will ignore the symbol's package and make an uninterned symbol instead.
with-interning macro can be used if the decoder should instead
(with-interning () (decode ...))
Interning symbols from untrusted data could lead to denial-of-service attacks via interning long-lived symbols in memory, so be careful.
Since conspack is a binary format, it's rather difficult for humans to
read just looking at the stream of bytes. Thus an
is provided. This is mostly useful for debugging the format; however
it may be of interest otherwise and certainly may be helpful when
creating other implementations.
(explain (encode '(1 2 3))) ;; => ((:LIST 4 ((:NUMBER :INT8 1) (:NUMBER :INT8 2) (:NUMBER :INT8 3) (:BOOLEAN NIL))) END-OF-FILE)