cl-protobufs

2024-10-12

No Description

Upstream URL

github.com/qitab/cl-protobufs

Author

Robert Brown <robert.brown@gmail.com>

License

MIT
README

cl-protobufs

SBCL-Tests CCL-Tests ABCL-Tests Gitter

cl-protobufs is an implementation of Google protocol buffers for Common Lisp.

Installation

  1. Install protoc

    Common Lisp code for a given .proto file is generated by a plug-in for protoc, the protocol buffer compiler. The plug-in is written in C++ and requires the full version of Google's protocol buffer code to be installed in order to build, not just the precompiled protoc binaries. We also require Google's ABSL C++ library to be installed.

    Depending on your package manager, you may be able to install these libraries through apt (or your system's package manager). If you need to isntall from source you can see the example in our continuous integration tests.

    Make sure the protoc binary is on your PATH.

  2. Build the Lisp protoc plugin

    We use CMake to install the Lisp protoc plugin.

    $ cd cl-protobufs/protoc
    $ cmake . -DCMAKE_CXX_STANDARD=17
    $ cmake --build . --target install --parallel 16

    Make sure the installation directory is on your PATH.

Generating the Lisp Code from Proto files

There are two ways of doing this, either using protoc or ASDF.

Using ASDF to generate Lisp code.

If you add :defsystem-depends-on (:cl-protobufs.asdf) to your defsystem, ASDF can generate Lisp code directly from your .proto files. For each .proto file add a component of type :protobuf-source-file with a :proto-pathname. You may also need to specify :proto-search-path to help the protoc compiler find protos imported by your .proto file. The pathnames can be relative with respect to the pathname of the system you are building.

Several examples can be found in cl-protobufs.asd.

Using protoc to Generate Lisp Code

To test your build, try generating Lisp code from the cl-protobufs/tests/case-preservation.proto file with the following command. Note that the command may differ slightly depending on what directory you're in and where you installed protoc-gen-cl-pb. In this case we assume you're in the directory containing the cl-protobufs directory. The reason will become clear in a moment.

$ protoc --plugin=protoc-gen-cl-pb=/usr/local/bin/protoc-gen-cl-pb \
  --cl-pb_out=output-file=case-preservation.lisp:/tmp \
  cl-protobufs/tests/case-preservation.proto

This command should generate a file named case-preservation.lisp in the /tmp/ directory.

When a .proto file imports another .proto file, protoc needs to know how to find the imported file. It does this by looking for the file relative to the values passed to it with the --proto_path option (or the -I short option).

To see an example of this, you can try generating Lisp code for cl-protobufs/tests/extend.proto. Still in the same directory, run the following command:

protoc --plugin=protoc-gen-cl-pb=/usr/local/bin/protoc-gen-cl-pb \
  --cl-pb_out=output-file=extend.lisp:/tmp
  --proto_path=cl-protobufs/tests \
  cl-protobufs/tests/extend.proto

The file /tmp/extend.lisp should be generated. Note that the .lisp file for each imported file also needs to be generated separately.

ASDF

Build and run the tests with ASDF:

  • Install Quicklisp and make sure to add it to your Lisp implementation's init file.

  • Install ASDF if it isn't part of your Lisp implementation.

  • Create a link to cl-protobufs so that Quicklisp will use the local version:

    $ cd ~/quicklisp/local-projects
    $ ln -s .../path/to/cl-protobufs
  • Start Lisp and evaluate (ql:quickload :cl-protobufs).

  • Load and run the tests:

    cl-user> (asdf:test-system :cl-protobufs)

Submitting changes to cl-protobufs

  1. Create a pull request like usual through GitHub.
  2. Sign the Google CLA agreement. This must be done only once for all Google projects. This must be done for your pull request to be approved.
  3. Add someone in the Googlers team as a reviewer.
  4. When the reviewer is satisfied they will add the Ready for Google label.
  5. The pull request will later be merged.

Examples

The files example/math.lisp and example/math-test.lisp give a simple example of creating a proto structure, populating its fields, serializing, and then deserializing. Looking over these files is a good way to get a quick feel for the protobuf API, which is described in detail below.

The file math.proto has two messages: AddNumbersRequest and AddNumbersResponse.

The prefix cl-protobufs. is automatically added to the package name specified by package math;, resulting in cl-protobufs.math as the full package name for the generated code. This is done to avoid conflicts with existing packages.

The full name of the Lisp type for the AddNumbersRequest message is cl-protobufs.math:add-numbers-request.

Generated Code Guide

This section explains the code generated from a .proto file by protoc-gen-cl-pb, the Common Lisp plugin for protoc. See the "protoc" directory in this distribution for the plugin code.

Note that protoc-gen-cl-pb transforms protobuf names like MyMessage or my_field to names that are more Lisp-like, such as my-message and my-field.

The code generated by protoc-gen-cl-pb uses macros to define the generated API. Protocol buffer messages should be defined in .proto files instead of invoking these macros directly. Internal details that are not in the API documented below may change incompatibly in the future.

Packages

The generated code for each .proto file lives in a package derived from the package statement.

package abc;

The generated Lisp package for the above is cl-protobufs.abc. The prefix "cl-protobufs." is added in order to avoid conflicts with another Lisp package named "abc". If you prefer to use a shorter package name we recommend using :local-nicknames as we do in many files in this library. Example:

(defpackage #:my.project
  (:use #:common-lisp)
  (:local-nicknames (#:abc #:cl-protobufs.abc)))  ; Referenced as abc:

You may have multiple .proto files use the same package if desired. The package exports the symbols described in the sections below.

Groups (proto2 only) {#Groups}

Groups are a deprecated way of defining a nested message and a field in a single declaration:

syntax = "proto2";
package abc;
message Foo {
  optional group Bar = 1 {
    optional string a = 1;
    optional int32 b = 2;
  }
}

This is treated exactly the same way as defining a nested message named Bar and a field named bar:

syntax = "proto2";
package abc;
message Foo {
  message Bar {
    optional string a = 1;
    optional int32 b = 2;
  }
  optional Bar bar = 1;
}

See the following sections for details on how to access nested messages and fields from Lisp.

Messages (proto2)

This section uses the following protocol buffer messages as an example:

syntax = "proto2";

package abc;

message DateRange {
  optional string min_date = 1;
  optional string max_date = 2;
}

Construct a date-range message:

(make-date-range :min-date "2020-05-27" :max-date "2020-05-28")

Set the value of the max-date field on an already-constructed range message:

(setf (date-range.max-date range) "2022-07-29")

Get the value of the min-date field from the range message:

(date-range.min-date range)

If the field was explicitly set, that value is returned. Otherwise, a default value is returned: the default value specified for this field in the .proto file, if any, or a type-specific default value. Type-specific default values are as follows:

protobuf typedefault value
numericszero of the appropriate type
stringsthe empty string
messagesnil
groupsnil
enumsthe first value listed in the .proto file
booleansnil
repeated fieldsthe empty list
symbolsnil

Note that with nested messages and long message names, field accessor names can get pretty long. If speed is not an issue it is also possible to access fields via the cl-protobufs:field generic function, which is an alternative (slower, but often more concise) way to read a protobuf field's value:

(cl-protobufs:field range 'min-date)

Check whether the min-date field has been set on range:

(date-range.has-min-date range)

(Returns t if the min-date field has been set, otherwise nil.)

Clear the value of the min-date field on range:

(date-range.clear-min-date range)

(After the above call, (date-range.has-min-date range) returns nil and (date-range.min-date range) returns the default value.)

Messages (proto3)

This section uses the following protocol buffer message as an example:

syntax = "proto3";

message Event {
  int32 day = 1;
  int32 month = 2;
  int32 year = 3;
  repeated string invitees = 4;
}

The generated code for proto3 messages is similar to proto2 messages. The only difference is the introduction of fields with no specified label, which are known as "singular" fields. For singular fields, the state of being unset and the state of being set to the default value for the type are indistinguishable. So, has-* functions, such as (event.has-day msg) are not defined.

The has-* functions for repeated fields are defined. They return true if and only if the field has been manually set and has not been cleared since.

This library supports optional fields in proto3 messages. These fields have the same semantics and generated code as proto2 optional fields.

Maps

This section uses the following protocol buffer message as an example:

message Dictionary {
  map<int32,string> map_field = 1;
}

This creates an associative map with keys of type int32 and values of type string. In general, the key type can be any scalar type except float and double. The value type can be any protobuf type.

For a message dict of type Dictionary, the following functions are created to access the map:

*-gethash returns the value associated with 2 in the map-field field in dict. If there is no value explicitly set, this function returns the default value of the value type. In this case, the empty string.

(dictionary.map-field-gethash 2 dict)

gethash can be used with setf to set fields as well. This associates 1 with the value "one" in the map-field field in dict:

(setf (dictionary.map-field-gethash 1 dict) "one")

*-remhash removes any entry with key 1 in the map-field field in dict:

(dictionary.map-field-remhash 1 dict)

Like the other fields, these functions are aliased by methods which are slower but more concise. Examples of the methods are: (map-field-gethash 2 dict), (setf (map-field-gethash 1 dict) "one"), and (map-field-remhash 1 dict). These have the same functionality as the above 3 functions respectively.

These functions are type checked, and interfacing with the map with these functions alone will guarantee that (de)serialization functions as well as the (dictionary.has-map-field dict) function will work properly. The underlying hash table may be accessed directly via (dictionary.map-field dict), but doing so may result in undefined behavior.

Enums

enum DayOfWeek {
  DAY_UNDEFINED = 0;
  MON = 1;
  TUE = 2;
  WED = 3;
  ...
}

The above enum defines the Lisp type day-of-week, like this:

(deftype day-of-week '(member :day-undefined :mon :tue :wed ...))

Each enum value is represented by a keyword symbol which is mapped to/from its numeric equivalent during serialization and deserialization.

Convert a keyword symbol to its numeric value:

(defun day-of-week-to-int (name) ...)

(Example: (day-of-week-to-int :mon) => 1)

Convert a number to its symbolic name:

(defun int-to-day-of-week (num) ...)

(Example: (int-to-day-of-week 1) => :MON)

Each numeric enum value is also bound to a constant by the same name but with "+" on each side:

(defconstant +mon+ 1)

Note that most enums should have an "undefined" or "unset" field with value 0 so that message fields using this enum type have a reasonable default value that is distinguishable from valid values. (It probably wouldn't make sense for Monday to be the default day.)

Name conflicts with other enum constants can easily happen if they all have a field named "undefined", so in this case we named the "undefined" field with a DAY_ prefix. For this reason it is also common to nest an enum inside the message that uses it.

When an enum is defined inside of a message instead of at top level in the .proto file, the message name is prepended to the name. For example, if DayOfWeek had been defined inside of a Schedule message it would result in these definitions:

(deftype schedule.day-of-week '(member :day-undefined :mon :tue :wed ...))
(schedule.day-of-week-to-int :mon) => 1
(int-to-schedule.day-of-week 1) => :MON
(defconstant +schedule.day-undefined+ 0)  ; may not need the DAY_ prefix now.
(defconstant +schedule.mon+ 1)
...

Enum Backward Compatibility

For backward compatibility, unrecognized enum values are retained during deserialization and are output again when serialized. This allows a client that acts as a pass-through for the enum data to function correctly even if it uses a different version of the proto than the systems it is communicating with.

Message Schema V1:

enum DayOfWeek {
  DAY_UNDEFINED = 0;
  MON = 1;
  TUE = 2;
  WED = 3;
}
message DayIWillWork {
  optional DayOfWeek workday = 1;
}

Message Schema V2:

enum DayOfWeek {
  DAY_UNDEFINED = 0;
  MON = 1;
  TUE = 2;
  WED = 3;
  THUR = 4;
}
message DayIWillWork {
  optional DayOfWeek workday = 1;
}

If we send a V2 message:

DayIWillWork {
  workday: THUR
}

to a V1 system it will save the fact that the enum it received is 4. Calling (day-i-will-work.workday v2-proto) will return :%undefined-4. Reserialization will add the workday enum value to the serialized protobuf message, and deserialization on a V2 system will properly add the new :thur enum value to the new protocol buffer message.

Trying to call (setf (day-i-will-work.workday v2-proto) :%undefined-4 will signal an error on a V1 or V2 system since :%undefined-4 isn't a known enum value.

Oneof

This section uses the following protobuf message as an example:

message Person {
  optional string name = 1;
  oneof AgeOneof {
    int32 age = 2;
    string birthdate = 3;
  }
}

To access fields inside a oneof, use the standard accessors outlined above. These fields have the semantics of proto2 optional fields, so has-* functions are created. For example:

(setf (person.age bob) 5)

...will set the age field of a Person object bob to 5.

Defining a oneof also creates two special functions:

*-oneof-case will return the lisp symbol corresponding to the field which is currently set. So, if we set age to 5, then this will return the symbol AGE. If no field is set, this function will return nil.

(person.age-oneof-case bob)

If we set the age field on our bob object, then:

(person.has-age bob) => t
(person.has-birthdate bob) => nil

To clear all fields inside of the oneof age-oneof:

(person.clear-age-oneof bob)

Repeated Fields

We use the following protocol buffer message as an example in this section:

message RepeatedProto {
  repeated integer my_int_list = 1;
  repeated integer my_int_vector = 1 [(lisp_container) = VECTOR];
}

This creates a message with two fields. The field my_int_list stores a list of integers. The default value is the empty list, i.e. nil. The field my_int_vector stores a vector of integers. The default value is an empty vector which is extendable with a fill pointer.

The APIs for the list and vector repeated fields are the same. There is a minor difference when pushing onto the different types of repeated field.

push-* pushes a value onto the corresponding list or vector field.

This pushes the integer 1 onto the my_int_list field in the RepeatedProto:

(repeated-proto.push-my-int-list 1 my-message)

(Since we push onto a list, this will push into the front of the list.)

This pushes the integer 1 onto the my_int_vector field in the RepeatedProto:

(repeated-proto.push-my-int-vector 1 my-message)

(Since we push onto a vector, this will push into the back of the vector.)

The has-* functions on a repeated field return true if there are no elements in the sequence:

(repeated-proto.has-my-int-list my-message)
(repeated-proto.has-my-int-vector my-message)

The length-of-* function returns the number of elements in the repeated field:

(repeated-proto.length-of-my-int-list my-message)
(repeated-proto.length-of-my-int-vector my-message)

The nth-* function returns the element at position n in the repeated field:

(repeated-proto.nth-my-int-list n my-message)
(repeated-proto.nth-my-int-vector n my-message)

(If the repeated field has length less than n, we signal an error.)

The clear-* function clears the repeated field of all elements:

(repeated-proto.clear-my-int-list my-message)
(repeated-proto.clear-my-int-vector my-message)

Symbols

A string field may be annotated as a symbol field, which will cause it to be represented in Lisp as an interned symbol rather than a string. Example:

import "third_party/lisp/cl_protobufs/proto2-descriptor-extensions.proto";

message Foo {
   optional symbol = 1 [(lisp_type) = "CL:SYMBOL"];
}

When converting from text mode, we uppercase the string, and if it does not contain a colon we intern it as a keyword symbol, except that we special case "T" and "NIL" to refer to the corresponding Lisp symbols. If the string contains a colon at the beginning, then we also intern it as a keyword symbol, but if it contains a colon elsewhere in the string, the portion preceding the colon is interpreted a package name. Thus, the following lines are equivalent

symbol: "foo"
symbol: "FOO"
symbol: "keyword:foo"

as are

symbol: "t"
symbol: "common-lisp:t"

but note that these are different:

symbol: "t"
symbol: ":t"

Multiple colons are not allowed, nor are the single-quote, double-quote, and backslash characters.

Options

TODO

Services

This section describes the generated code API for a protobuf service in a proto file. You must have a corresponding RPC library as well; cl-protobufs just generates the methods.

The gRPC library, or any library containing the following form:

(setq cl-protobufs:*rpc-call-function* 'start-call)

can be used as the underlying RPC mechanism. We will show examples with the expectation that you are using gRPC.

The following example service definition is used throughout this section.

lisp_package = "math";

message AddNumbersRequest {
  optional int32 number1 = 1;
  optional int32 number2 = 1;
}

message AddNumbersResponse {
  optional int32 sum = 1;
}

Service MyService
  rpc AddNumbers(AddNumbersRequest) returns (AddNumbersResponse) {}
}

The cl-protobufs protoc plugin generates two packages:

  • cl-protobufs.math
  • cl-protobufs.math-rpc

The package cl-protobufs.math contains the add-numbers-request and add-numbers-response protocol buffer messages.

Client

The package cl-protobufs.math-rpc contains a stub for call-add-numbers. A message can be sent to a server implementing the Greeter service with:

  (grpc:with-insecure-channel
      (channel (concatenate 'string hostname ":" (write-to-string port-number)))
    (let* ((request (cl-protobufs.testing:make-add-numbers-request
                     :number-1 1 :number-2 2))
           (response (cl-protobufs.math-rpc:call-add-numbers channel request)))
      ...))

Server

There is currently no known supported open framework for implementing the server portion of Protocol Buffer services in Lisp.

(defgeneric add-numbers-impl (channel (request add-numbers-request) rpc))

A generic function generated for each RPC in the service definition. The name is the concatenation of the protobuf method name (in its Lisp form) and the string "-impl".

To implement the service define a method for each generic function. The method must return the type declared in the .proto file. Example:

(defmethod add-numbers-impl (channel (request add-numbers-request) rpc)
  (make-add-numbers-response :sum (+ (add-numbers-request.number1 request)
                                     (add-numbers-request.number2 request))))

The channel argument is supplied by the underlying RPC code and differs depending on which transport mechanism (HTTP, TCP, IPC, etc) is being used. The channel and rpc arguments can usually be ignored.

The cl-protobufs Package

This section documents the symbols exported from the cl-protobufs package.

message is the base type from which every generated protobuf message inherits:

(defstruct message ...)

print-text-format prints a protocol buffer message to a stream. object is the protocol buffer message, group, or extension to print. stream is the stream to print to. pretty-print-p may be set to nil to minimize textual output by omitting most whitespace.

(defun print-text-format (object &key
                                 (indent -2)
                                 (stream *standard-output*)
                                 (pretty-print-p t)))

parse-text-format parses a protocol buffer message written in text-format. type is the type of message to parse. stream is the stream to read from.

(defun parse-text-format (type &key (stream *standard-input*)))

is-initialized checks if object has all required fields set, and recursively all of its sub-objects have all of their required fields set. An error may be signaled if an attempt is made to serialize a protobuf object that is not initialized. Signals an error if object is not a protobuf message.

(defun is-initialized (object))

proto-equal checks if two protobuf messages are equal. By default, two messages are equal if calling the getter on each field would retrieve the same value. This means that a message with a field explicitly set to the default value is considered equal to a message with that field not set. If exact is true, consider the messages to be equal only if the same fields have been explicitly set. message-1 and message-2 must both be protobuf messages.

(defun proto-equal (message-1 message-2 &key exact nil))

clear resets the protobuf message to its initial state:

(defgeneric clear (object message))

has-field returns whether field has been explicitly set in object. field is the symbol naming the field in the proto message.

(defun has-field (object field))

Serialization

byte-vector: a vector of unsigned-bytes. In serialization functions, this is often referred to as 'buffer'.

(deftype byte-vector)

make-byte-vector: constructor to make a byte vector. size is the size of the underlying vector. adjustable is a boolean value determining whether the byte-vector can change size.

(defun make-byte-vector (size &key adjustable))

serialize-to-bytes creates a byte-vector and serializes a protobuf message to that byte-vector. The object is the protobuf message instance to serialize. Optionally use type to specify the type of object to serialize.

(defun serialize-to-bytes (object &optional (type (type-of object))))

serialize-to-stream: serialize object, a protobuf message, to stream. Optionally use type to specify the type of object to serialize.

(defun serialize-to-stream (object stream &optional (type (type-of object)))

deserialize-from-bytes: deserialize a protobuf message returning the newly created structure.

  • type is

the symbol naming the protobuf message to deserialize.

  • buffer is the

byte-vector containing the data to deserialize.

  • start (inclusive) and end

(exclusive) delimit the range of bytes to deserialize.

(defun deserialize-from-bytes (type buffer &optional (start 0) (end (length buffer))))

deserialize-from-stream: deserialize an object of type type by reading bytes from stream. type is the symbol naming the protobuf message to deserialize.

(defun deserialize-from-stream (type stream)

Well Known Types

Several functions are exported from the cl-protobufs.well-known-types package. A list of all well known types can be found in the official Protocol Buffers documentation.

unpack-any: takes an Any protobuf message any-message and turns it into the stored protobuf message, as long as the qualified-name given in the type-url corresponds to a loaded message type. The type-url must be of the form base-url/qualified-name.

(defun unpack-any (any-message))

pack-any: creates an Any protobuf message given a protobuf message and a base-url.

(defun pack-any (message &key (base-url "type.googleapis.com"))

TODO: examples

JSON Mapping

The cl-protobufs.json package exports functions to convert between protobuf objects and the canonical JSON encoding.

print-json: takes any protobuf message message and prints it as JSON. The parameters are:

  • pretty-print-p: Indent the output by indent spaces and print newlines.
  • stream: The Lisp stream to output to.
  • camel-case-p: Print field names in camelCase. If nil, then print field names as they appear in the .proto file.
  • numeric-enums-p: If true, print enum values by their number rather than their name.
(defun print-json (message &key (pretty-print-p t) (stream *standard-output*)
                             (camel-case-p t) numeric-enums-p))

parse-json: parses a JSON encoding and return the parsed protobuf object. The parameters are:

  • type: Either the Lisp type or the message-descriptor of the object to parse.
  • stream: The stream to read from. By default, this is *standard-input*.
  • ignore-unknown-fields-p: If true, silently ignore any unrecognized fields encountered when parsing. If nil, the parser will throw an error.
(defun parse-json (type &key stream ignore-unknown-fields-p)

Known Deficiencies

This is a non-exhaustive list of ways in which cl-protobufs doesn't currently meet the Protocol Buffers spec.

  • Groups are not supported within oneof fields.
  • The [deprecated=true] field option is not supported.

Dependencies (0)

    Dependents (0)

      • GitHub
      • Quicklisp