cl-protobufs
2024-10-12
No Description
cl-protobufs
cl-protobufs is an implementation of Google protocol buffers for Common Lisp.
Installation
-
Install
protoc
Common Lisp code for a given
.proto
file is generated by a plug-in forprotoc
, the protocol buffer compiler. The plug-in is written in C++ and requires the full version of Google's protocol buffer code to be installed in order to build, not just the precompiled protoc binaries. We also require Google's ABSL C++ library to be installed.Depending on your package manager, you may be able to install these libraries through
apt
(or your system's package manager). If you need to isntall from source you can see the example in our continuous integration tests.Make sure the
protoc
binary is on yourPATH
. -
Build the Lisp
protoc
pluginWe use CMake to install the Lisp protoc plugin.
$ cd cl-protobufs/protoc $ cmake . -DCMAKE_CXX_STANDARD=17 $ cmake --build . --target install --parallel 16
Make sure the installation directory is on your
PATH
.
Generating the Lisp Code from Proto files
There are two ways of doing this, either using protoc
or ASDF
.
Using ASDF
to generate Lisp code.
If you add :defsystem-depends-on (:cl-protobufs.asdf)
to your defsystem,
ASDF
can generate Lisp code directly from your .proto files. For each
.proto
file add a component of type :protobuf-source-file
with a
:proto-pathname
. You may also need to specify :proto-search-path
to
help the protoc
compiler find protos imported by your .proto
file.
The pathnames can be relative with respect to the pathname
of the system
you are building.
Several examples can be found in cl-protobufs.asd
.
Using protoc
to Generate Lisp Code
To test your build, try generating Lisp code from the
cl-protobufs/tests/case-preservation.proto
file with the following command.
Note that the command may differ slightly depending on what directory you're in
and where you installed protoc-gen-cl-pb
. In this case we assume you're in the
directory containing the cl-protobufs
directory. The reason will become
clear in a moment.
$ protoc --plugin=protoc-gen-cl-pb=/usr/local/bin/protoc-gen-cl-pb \ --cl-pb_out=output-file=case-preservation.lisp:/tmp \ cl-protobufs/tests/case-preservation.proto
This command should generate a file named case-preservation.lisp
in the
/tmp/
directory.
When a .proto
file imports another .proto
file, protoc
needs to know how
to find the imported file. It does this by looking for the file relative to the
values passed to it with the --proto_path
option (or the -I
short option).
To see an example of this, you can try generating Lisp code for
cl-protobufs/tests/extend.proto
. Still in the same directory, run the
following command:
protoc --plugin=protoc-gen-cl-pb=/usr/local/bin/protoc-gen-cl-pb \ --cl-pb_out=output-file=extend.lisp:/tmp --proto_path=cl-protobufs/tests \ cl-protobufs/tests/extend.proto
The file /tmp/extend.lisp
should be generated. Note that the .lisp
file for
each imported file also needs to be generated separately.
ASDF
Build and run the tests with ASDF:
-
Install Quicklisp and make sure to add it to your Lisp implementation's init file.
-
Install ASDF if it isn't part of your Lisp implementation.
-
Create a link to cl-protobufs so that Quicklisp will use the local version:
$ cd ~/quicklisp/local-projects $ ln -s .../path/to/cl-protobufs
-
Start Lisp and evaluate
(ql:quickload :cl-protobufs)
. -
Load and run the tests:
cl-user> (asdf:test-system :cl-protobufs)
Submitting changes to cl-protobufs
- Create a pull request like usual through GitHub.
- Sign the Google CLA agreement. This must be done only once for all Google projects. This must be done for your pull request to be approved.
- Add someone in the Googlers team as a reviewer.
- When the reviewer is satisfied they will add the
Ready for Google
label. - The pull request will later be merged.
Examples
The files example/math.lisp
and example/math-test.lisp
give a simple example
of creating a proto structure, populating its fields, serializing, and then
deserializing. Looking over these files is a good way to get a quick feel for
the protobuf API, which is described in detail below.
The file math.proto
has two messages: AddNumbersRequest
and
AddNumbersResponse
.
The prefix cl-protobufs.
is automatically added to the package name specified
by package math;
, resulting in cl-protobufs.math
as the full package name
for the generated code. This is done to avoid conflicts with existing packages.
The full name of the Lisp type for the AddNumbersRequest
message is
cl-protobufs.math:add-numbers-request
.
Generated Code Guide
This section explains the code generated from a .proto
file by
protoc-gen-cl-pb
, the Common Lisp plugin for protoc
. See the "protoc"
directory in this distribution for the plugin code.
Note that protoc-gen-cl-pb
transforms protobuf names like MyMessage
or
my_field
to names that are more Lisp-like, such as my-message
and
my-field
.
The code generated by protoc-gen-cl-pb
uses macros to define the generated API.
Protocol buffer messages should be defined in .proto
files instead of invoking
these macros directly. Internal details that are not in the API documented below
may change incompatibly in the future.
Packages
The generated code for each .proto
file lives in a package derived from the
package
statement.
package abc;
The generated Lisp package for the above is cl-protobufs.abc
. The prefix
"cl-protobufs." is added in order to avoid conflicts with another Lisp package
named "abc". If you prefer to use a shorter package name we recommend using
:local-nicknames
as we do in many files in this library. Example:
(defpackage #:my.project (:use #:common-lisp) (:local-nicknames (#:abc #:cl-protobufs.abc))) ; Referenced as abc:
You may have multiple .proto
files use the same package if desired. The
package exports the symbols described in the sections below.
Groups (proto2 only) {#Groups}
Groups are a deprecated way of defining a nested message and a field in a single declaration:
syntax = "proto2"; package abc; message Foo { optional group Bar = 1 { optional string a = 1; optional int32 b = 2; } }
This is treated exactly the same way as defining a nested message named Bar
and a field named bar
:
syntax = "proto2"; package abc; message Foo { message Bar { optional string a = 1; optional int32 b = 2; } optional Bar bar = 1; }
See the following sections for details on how to access nested messages and fields from Lisp.
Messages (proto2)
This section uses the following protocol buffer messages as an example:
syntax = "proto2"; package abc; message DateRange { optional string min_date = 1; optional string max_date = 2; }
Construct a date-range
message:
(make-date-range :min-date "2020-05-27" :max-date "2020-05-28")
Set the value of the max-date
field on an already-constructed range
message:
(setf (date-range.max-date range) "2022-07-29")
Get the value of the min-date
field from the range
message:
(date-range.min-date range)
If the field was explicitly set, that value is returned. Otherwise, a default
value is returned: the default value specified for this field in the .proto
file, if any, or a type-specific default value. Type-specific default values are
as follows:
protobuf type | default value |
---|---|
numerics | zero of the appropriate type |
strings | the empty string |
messages | nil |
groups | nil |
enums | the first value listed in the .proto file |
booleans | nil |
repeated fields | the empty list |
symbols | nil |
Note that with nested messages and long message names, field accessor names can
get pretty long. If speed is not an issue it is also possible to access fields
via the cl-protobufs:field
generic function, which is an alternative (slower,
but often more concise) way to read a protobuf field's value:
(cl-protobufs:field range 'min-date)
Check whether the min-date
field has been set on range
:
(date-range.has-min-date range)
(Returns t
if the min-date
field has been set, otherwise nil
.)
Clear the value of the min-date
field on range
:
(date-range.clear-min-date range)
(After the above call, (date-range.has-min-date range)
returns nil
and
(date-range.min-date range)
returns the default value.)
Messages (proto3)
This section uses the following protocol buffer message as an example:
syntax = "proto3"; message Event { int32 day = 1; int32 month = 2; int32 year = 3; repeated string invitees = 4; }
The generated code for proto3 messages is similar to proto2 messages. The only
difference is the introduction of fields with no specified label, which are
known as "singular" fields. For singular fields, the state of being unset and
the state of being set to the default value for the type are indistinguishable.
So, has-*
functions, such as (event.has-day msg)
are not defined.
The has-*
functions for repeated fields are defined. They return true if and
only if the field has been manually set and has not been cleared since.
This library supports optional fields in proto3 messages. These fields have the same semantics and generated code as proto2 optional fields.
Maps
This section uses the following protocol buffer message as an example:
message Dictionary { map<int32,string> map_field = 1; }
This creates an associative map with keys of type int32
and values of type
string
. In general, the key type can be any scalar type except float
and
double
. The value type can be any protobuf type.
For a message dict
of type
Dictionary
, the following functions are created to access the map:
*-gethash
returns the value associated with 2
in the map-field
field in dict
.
If there is no value explicitly set, this function returns the default value of
the value type. In this case, the empty string.
(dictionary.map-field-gethash 2 dict)
gethash
can be used with setf
to set fields as well.
This associates 1
with the value "one"
in the map-field
field in dict
:
(setf (dictionary.map-field-gethash 1 dict) "one")
*-remhash
removes any entry with key 1
in the map-field
field in dict
:
(dictionary.map-field-remhash 1 dict)
Like the other fields, these functions are aliased by methods which are slower
but more concise. Examples of the methods are: (map-field-gethash 2 dict)
,
(setf (map-field-gethash 1 dict) "one")
, and (map-field-remhash 1 dict)
.
These have the same functionality as the above 3 functions respectively.
These functions are type checked, and interfacing with the map with these
functions alone will guarantee that (de)serialization functions as well as the
(dictionary.has-map-field dict)
function will work properly. The underlying
hash table may be accessed directly via (dictionary.map-field dict)
, but doing
so may result in undefined behavior.
Enums
enum DayOfWeek { DAY_UNDEFINED = 0; MON = 1; TUE = 2; WED = 3; ... }
The above enum defines the Lisp type day-of-week
, like this:
(deftype day-of-week '(member :day-undefined :mon :tue :wed ...))
Each enum value is represented by a keyword symbol which is mapped to/from its numeric equivalent during serialization and deserialization.
Convert a keyword symbol to its numeric value:
(defun day-of-week-to-int (name) ...)
(Example: (day-of-week-to-int :mon) => 1
)
Convert a number to its symbolic name:
(defun int-to-day-of-week (num) ...)
(Example: (int-to-day-of-week 1) => :MON
)
Each numeric enum value is also bound to a constant by the same name but with "+" on each side:
(defconstant +mon+ 1)
Note that most enums should have an "undefined" or "unset" field with value 0
so that message fields using this enum type have a reasonable default value that
is distinguishable from valid values. (It probably wouldn't make sense for
Monday to be the default day.)
Name conflicts with other enum constants can easily happen if they all have a
field named "undefined", so in this case we named the "undefined" field with a
DAY_
prefix. For this reason it is also common to nest an enum inside the
message that uses it.
When an enum is defined inside of a message instead of at top level in the
.proto
file, the message name is prepended to the name. For example, if
DayOfWeek
had been defined inside of a Schedule
message it would result in
these definitions:
(deftype schedule.day-of-week '(member :day-undefined :mon :tue :wed ...)) (schedule.day-of-week-to-int :mon) => 1 (int-to-schedule.day-of-week 1) => :MON (defconstant +schedule.day-undefined+ 0) ; may not need the DAY_ prefix now. (defconstant +schedule.mon+ 1) ...
Enum Backward Compatibility
For backward compatibility, unrecognized enum values are retained during deserialization and are output again when serialized. This allows a client that acts as a pass-through for the enum data to function correctly even if it uses a different version of the proto than the systems it is communicating with.
Message Schema V1:
enum DayOfWeek { DAY_UNDEFINED = 0; MON = 1; TUE = 2; WED = 3; } message DayIWillWork { optional DayOfWeek workday = 1; }
Message Schema V2:
enum DayOfWeek { DAY_UNDEFINED = 0; MON = 1; TUE = 2; WED = 3; THUR = 4; } message DayIWillWork { optional DayOfWeek workday = 1; }
If we send a V2 message:
DayIWillWork { workday: THUR }
to a V1 system it will save the fact that the enum it
received is 4. Calling (day-i-will-work.workday v2-proto)
will return :%undefined-4
. Reserialization will add the
workday enum value to the serialized protobuf message, and
deserialization on a V2 system will properly add the
new :thur
enum value to the new protocol buffer message.
Trying to call (setf (day-i-will-work.workday v2-proto) :%undefined-4
will signal an error on a V1 or V2 system since :%undefined-4
isn't a
known enum value.
Oneof
This section uses the following protobuf message as an example:
message Person { optional string name = 1; oneof AgeOneof { int32 age = 2; string birthdate = 3; } }
To access fields inside a oneof, use the standard accessors outlined above.
These fields have the semantics of proto2 optional fields, so has-*
functions
are created. For example:
(setf (person.age bob) 5)
...will set the age
field of a Person
object bob
to 5
.
Defining a oneof also creates two special functions:
*-oneof-case
will return the lisp symbol corresponding to the field which is currently
set. So, if we set age
to 5
, then this will return the symbol AGE
. If no
field is set, this function will return nil
.
(person.age-oneof-case bob)
If we set the age
field on our bob
object, then:
(person.has-age bob) => t (person.has-birthdate bob) => nil
To clear all fields inside of the oneof age-oneof
:
(person.clear-age-oneof bob)
Repeated Fields
We use the following protocol buffer message as an example in this section:
message RepeatedProto { repeated integer my_int_list = 1; repeated integer my_int_vector = 1 [(lisp_container) = VECTOR]; }
This creates a message with two fields.
The field my_int_list
stores a list of integers.
The default value is the empty list, i.e. nil
.
The field my_int_vector
stores a vector of integers.
The default value is an empty vector which is extendable with a fill pointer.
The APIs for the list and vector repeated fields are the same. There is a minor difference when pushing onto the different types of repeated field.
push-*
pushes a value onto the corresponding list or vector field.
This pushes the integer 1 onto the my_int_list
field in the RepeatedProto
:
(repeated-proto.push-my-int-list 1 my-message)
(Since we push onto a list, this will push into the front of the list.)
This pushes the integer 1
onto the my_int_vector
field in the RepeatedProto
:
(repeated-proto.push-my-int-vector 1 my-message)
(Since we push onto a vector, this will push into the back of the vector.)
The has-*
functions on a repeated field return true if there
are no elements in the sequence:
(repeated-proto.has-my-int-list my-message) (repeated-proto.has-my-int-vector my-message)
The length-of-*
function returns the number of elements in the repeated field:
(repeated-proto.length-of-my-int-list my-message) (repeated-proto.length-of-my-int-vector my-message)
The nth-*
function returns the element at position n
in the repeated field:
(repeated-proto.nth-my-int-list n my-message) (repeated-proto.nth-my-int-vector n my-message)
(If the repeated field has length less than n
, we signal an error.)
The clear-*
function clears the repeated field of all elements:
(repeated-proto.clear-my-int-list my-message) (repeated-proto.clear-my-int-vector my-message)
Symbols
A string field may be annotated as a symbol field, which will cause it to be represented in Lisp as an interned symbol rather than a string. Example:
import "third_party/lisp/cl_protobufs/proto2-descriptor-extensions.proto"; message Foo { optional symbol = 1 [(lisp_type) = "CL:SYMBOL"]; }
When converting from text mode, we uppercase the string, and if it does not contain a colon we intern it as a keyword symbol, except that we special case "T" and "NIL" to refer to the corresponding Lisp symbols. If the string contains a colon at the beginning, then we also intern it as a keyword symbol, but if it contains a colon elsewhere in the string, the portion preceding the colon is interpreted a package name. Thus, the following lines are equivalent
symbol: "foo"
symbol: "FOO"
symbol: "keyword:foo"
as are
symbol: "t"
symbol: "common-lisp:t"
but note that these are different:
symbol: "t"
symbol: ":t"
Multiple colons are not allowed, nor are the single-quote, double-quote, and backslash characters.
Options
TODO
Services
This section describes the generated code API for a protobuf service in a proto file.
You must have a corresponding RPC library as well; cl-protobufs
just generates the
methods.
The gRPC library, or any library containing the following form:
(setq cl-protobufs:*rpc-call-function* 'start-call)
can be used as the underlying RPC mechanism. We will show examples with the expectation that you are using gRPC.
The following example service definition is used throughout this section.
lisp_package = "math"; message AddNumbersRequest { optional int32 number1 = 1; optional int32 number2 = 1; } message AddNumbersResponse { optional int32 sum = 1; } Service MyService rpc AddNumbers(AddNumbersRequest) returns (AddNumbersResponse) {} }
The cl-protobufs
protoc
plugin generates two packages:
cl-protobufs.math
cl-protobufs.math-rpc
The package cl-protobufs.math
contains the add-numbers-request
and add-numbers-response
protocol buffer messages.
Client
The package cl-protobufs.math-rpc
contains a stub for call-add-numbers
. A message can be
sent to a server implementing the Greeter
service with:
(grpc:with-insecure-channel (channel (concatenate 'string hostname ":" (write-to-string port-number))) (let* ((request (cl-protobufs.testing:make-add-numbers-request :number-1 1 :number-2 2)) (response (cl-protobufs.math-rpc:call-add-numbers channel request))) ...))
Server
There is currently no known supported open framework for implementing the server portion of Protocol Buffer services in Lisp.
(defgeneric add-numbers-impl (channel (request add-numbers-request) rpc))
A generic function generated for each RPC in the service definition. The name is the concatenation of the protobuf method name (in its Lisp form) and the string "-impl".
To implement the service define a method for each generic function. The method
must return the type declared in the .proto
file. Example:
(defmethod add-numbers-impl (channel (request add-numbers-request) rpc) (make-add-numbers-response :sum (+ (add-numbers-request.number1 request) (add-numbers-request.number2 request))))
The channel
argument is supplied by the underlying RPC code and differs
depending on which transport mechanism (HTTP, TCP, IPC, etc) is being used. The
channel
and rpc
arguments can usually be ignored.
The cl-protobufs Package
This section documents the symbols exported from the cl-protobufs
package.
message
is the base type from which every generated protobuf message inherits:
(defstruct message ...)
print-text-format
prints a protocol buffer message to a stream. object
is the protocol buffer
message, group, or extension to print. stream
is the stream to print to.
pretty-print-p
may be set to nil
to minimize textual output by omitting
most whitespace.
(defun print-text-format (object &key (indent -2) (stream *standard-output*) (pretty-print-p t)))
parse-text-format
parses a protocol buffer message written in text-format.
type
is the type of message to parse. stream
is the stream to read from.
(defun parse-text-format (type &key (stream *standard-input*)))
is-initialized
checks if object
has all required fields set, and recursively all of its
sub-objects have all of their required fields set. An error may be signaled if
an attempt is made to serialize a protobuf object that is not initialized.
Signals an error if object
is not a protobuf message.
(defun is-initialized (object))
proto-equal
checks if two protobuf messages are equal. By default, two messages are equal if
calling the getter on each field would retrieve the same value. This means that
a message with a field explicitly set to the default value is considered equal
to a message with that field not set.
If exact
is true, consider the messages to be equal only if the same fields
have been explicitly set.
message-1
and message-2
must both be protobuf messages.
(defun proto-equal (message-1 message-2 &key exact nil))
clear
resets the protobuf message to its initial state:
(defgeneric clear (object message))
has-field
returns whether field
has been explicitly set in object
. field
is the
symbol naming the field in the proto message.
(defun has-field (object field))
Serialization
byte-vector
: a vector of unsigned-bytes. In serialization functions, this is often referred to
as 'buffer'.
(deftype byte-vector)
make-byte-vector
: constructor to make a byte vector. size
is the size of the underlying vector.
adjustable
is a boolean value determining whether the byte-vector can change
size.
(defun make-byte-vector (size &key adjustable))
serialize-to-bytes
creates a byte-vector and serializes a protobuf message to that byte-vector. The
object
is the protobuf message instance to serialize. Optionally use type
to
specify the type of object to serialize.
(defun serialize-to-bytes (object &optional (type (type-of object))))
serialize-to-stream
: serialize object
, a protobuf message, to stream
. Optionally use type
to
specify the type of object to serialize.
(defun serialize-to-stream (object stream &optional (type (type-of object)))
deserialize-from-bytes
: deserialize a protobuf message returning the newly created structure.
type
is
the symbol naming the protobuf message to deserialize.
buffer
is the
byte-vector containing the data to deserialize.
start
(inclusive) andend
(exclusive) delimit the range of bytes to deserialize.
(defun deserialize-from-bytes (type buffer &optional (start 0) (end (length buffer))))
deserialize-from-stream
: deserialize an object of type type
by reading bytes from stream
.
type
is the symbol naming the protobuf message to deserialize.
(defun deserialize-from-stream (type stream)
Well Known Types
Several functions are exported from the cl-protobufs.well-known-types
package.
A list of all well known types can be found in the
official Protocol Buffers documentation.
unpack-any
: takes an Any
protobuf message any-message
and turns it into the stored
protobuf message, as long as the qualified-name given in the type-url corresponds
to a loaded message type. The type-url must be of the form
base-url/qualified-name.
(defun unpack-any (any-message))
pack-any
: creates an Any
protobuf message given a protobuf message
and a base-url
.
(defun pack-any (message &key (base-url "type.googleapis.com"))
TODO: examples
JSON Mapping
The cl-protobufs.json
package exports functions to convert between protobuf
objects and
the canonical JSON encoding.
print-json
: takes any protobuf message message
and prints it as JSON. The parameters are:
pretty-print-p
: Indent the output byindent
spaces and print newlines.stream
: The Lisp stream to output to.camel-case-p
: Print field names in camelCase. Ifnil
, then print field names as they appear in the .proto file.numeric-enums-p
: If true, print enum values by their number rather than their name.
(defun print-json (message &key (pretty-print-p t) (stream *standard-output*) (camel-case-p t) numeric-enums-p))
parse-json
: parses a JSON encoding and return the parsed protobuf object. The parameters are:
type
: Either the Lisp type or themessage-descriptor
of the object to parse.stream
: The stream to read from. By default, this is *standard-input*.ignore-unknown-fields-p
: If true, silently ignore any unrecognized fields encountered when parsing. Ifnil
, the parser will throw an error.
(defun parse-json (type &key stream ignore-unknown-fields-p)
Known Deficiencies
This is a non-exhaustive list of ways in which cl-protobufs doesn't currently meet the Protocol Buffers spec.
- Groups are not supported within
oneof
fields. - The
[deprecated=true]
field option is not supported.