misc-extensions
2024-10-12
The GMap iteration macro, plus a few other useful macros.
Upstream URL
Author
License
Misc-Extensions
The Misc-Extensions system provides several macros that I like to use.
1. Macro nlet
Macro nlet
is an upward-compatible replacement for cl:let
that generalizes
let
, let*
, and multiple-value-bind
. It can be imported as let
,
shadowing cl:let
— this is how I have used it for years — but if that seems
too radical, the same macro can be imported as nlet
. For clarity, I will
refer to it as nlet
here.
There are two key ideas:
nlet
allows more than one variable in a binding clause, to bind to additional values returned by the init-form, as withmultiple-value-bind
.nlet
allows the binding clauses to be nested to indicate sequential binding: more deeply nested clauses are within the scopes of bindings made by less deeply nested clauses. Within a level, bindings are parallel, as incl:let
.
A simple example first:
(nlet ((a (foo)) ((b c (bar a)))) ...)
Here, first a
is bound to the (first) value of (foo)
, and then b
and c
are bound to the (first and second) values of (bar a)
, where the latter a
refers to the binding created by the first clause. (As with
multiple-value-bind
, if foo
returns more than one value, or bar
returns
more than two, the extra values are silently ignored; if fewer values are
returned than expected, nil
is supplied for the missing ones.)
A more complex example:
(nlet ((a b c (zot)) ((d (quux a c)) ((e f (mumble b d)) (g (mung a)))) ((h (frobozz c)) ((i (xyzzy h)))) (*print-level* 3)) ...)
First a
, b
, and c
are bound to the first three values of (zot)
, and in
parallel, *print-level*
is bound to 3; then d
and h
are bound; then e
,
f
, g
, and i
are bound.
As this example illustrates, all bindings at a given nesting level are done in
parallel, with all bindings at a deeper level following. Stylistically, it is
expected that init-forms in nested clauses will refer only to variables bound in
containing clauses. However, this is not enforced; for instance, the init-form
for g
could refer to h
, since the latter is bound one level out.
The macro correctly handles declare
forms at the beginning of the body,
emitting the declarations at the correct level within the expansion so that they
apply to the binding named.
I first wrote this macro in 1980, as I was developing a personal Lisp style that
made heavier use of functional programming than was, I think, common at the
time. I quickly found that multiple values were a great convenience for this
style, but the name multiple-value-bind
was annoyingly long. I think many
others have come to the same conclusion, as various other people have written
binding macros that handle multiple values less verbosely. (Also, the let
construct in the Dylan language can bind multiple variables, as can tuple
assignment in Python.)
The value of using nesting to indicate sequencing may be less clear; I'm sure
some readers are thinking, "just use let*
and be done with it". That
certainly is a viable choice. But I find it makes these expressions a little
more readable that I can tell exactly which init-forms are intended to reference
outer bindings, and which aren't; if they were in one big let*
, I would have
to read all the init-forms to tell that.
2. GMap
GMap is an iteration macro for Common Lisp — now one of many, but actually among
the oldest; I first wrote it in 1980, in Lisp Machine Lisp. It was conceived as
a generalization of mapcar
(hence the name). It is intended for cases when
mapcar
doesn't suffice because the things you want to map over aren't in
lists, or you need to collect the results of the mapping into something other
than a list.
That is, gmap
is probably the right thing to use when you are using iteration
to perform the same computation on each element of some collection, as opposed
to changing your state in some complicated way on every iteration of a loop.
It's conceptually reasonable to imagine all the iterations of a gmap
as
happening in parallel, just as you might with mapcar
. It also supports
arbitrary reductions of the results of the mapped function; more on this below.
GMap is explicitly intended only for iterations that fit this "map-reduce"
model. It is not trying to be a "universal" teration construct. People have
asked me what guarantees it offers concerning the ordering of the various
operations it performs, and the answer is none, other than those obviously
imposed by the data flow (a result can't be used until it is computed). I think
that the question was rooted in experience of people using loop
or other
iteration constructs and supplying variable update expressions with side
effects, so that there was "crosswise" data flow between the iterations. I
strongly advise that such side effects be avoided in gmap
calls. If you find
yourself wanting to use them, either there is a better way (more on this below),
or else gmap
simply isn't the right tool for the job. In short, you should
think of gmap
very much as a functional iteration construct (although the
implementation does use setq
internally).
In general, my philosophy about iteration in Lisp is that there are many ways to
do it, for the very good reason that there are many kinds of iterations, and one
should use the tool appropriate for the particular case one is confronted with.
So, for example, I almost never write a gmap
form with side effects. If I'm
just iterating over a list for effect, I'll use dolist
. For iterations with
cross-iteration data flow ("loop-carried dependences" is the compiler
developer's term) or where the iteration space is not well defined in advance
(e.g., iterating over lines being read from a file) I might use good old do
,
or I might even write the code tail-recursively, counting on the fact that most
CL implementations these days do tail-call optimization at least between
functions defined in a single labels
form.
All that said, occasions to use gmap
are not at all uncommon, especially if
you like to write your Lisp in a heavily functional style.
The difference between mapcar
and gmap
is that with gmap
, you explicitly
indicate what kinds of collections the elements come from and how to combine the
successive values of the mapped function into a result. For example, the
following two expressions are equivalent:
(mapcar #'foo this-list that-list)
and
(gmap (:result list) #'foo (:arg list this-list) (:arg list that-list))
[Side note to existing GMap users: this is the new, GMap 4.0 syntax. The older syntax is considered mildly deprecated, but will continue to be supported indefinitely. More on this below.]
The (:result list)
subform indicates that gmap
is to build a list; the
:arg
subforms tell it that this-list
and that-list
are in fact lists of
elements over which foo
is to be mapped. Other types are known besides
list
; for example, string
, if used as an argument type, causes its argument
to be viewed as a string; the values it supplies to the function being mapped
are the successive characters of the string.
Like mapcar
, gmap
accepts any number of argument specs; each one supplies
one argument (or more, in some cases) to each call to the function being mapped.
Also like mapcar
, gmap
terminates its iteration when any of the arguments
runs out of elements.
For a small collection of examples, look at test-new-syntax' in
tests.lisp'.
2.1. Argument types
The set of argument types is extensible. Thus you can adapt gmap
to other
kinds of data structures over which you would like to iterate. For details of
how to do this, see def-arg-type
in the source file, and study the existing
definitions.
An argument type can explicitly indicate that it supplies more than one argument to the function being mapped. That function must have one or more additional parameters at the corresponding point in its parameter list. For example:
(gmap (:result list) #'(lambda (x y z) (cons x (+ y z))) (:arg alist '((a . 47) (b . 72))) (:arg list '(2 6))) ==> ((A . 49) (B . 78))
Here x
and y
receive the pairs of the alist, and z
receives the elements
of the list.
2.1.1. Predefined argument types
-
constant
value: Yieldsvalue
on every iteration. -
list
list: Yields successive elements oflist
. -
improper-list
list: Yields the successive elements oflist
, which may be improper; any non-cons tail terminates the iteration. -
alist
alist: Yields, as two values, the successive pairs ofalist
. -
plist
plist: Yields, as two values, the successive pairs of elements of `plist'; that is, there is one iteration for each two elements. -
tails
list: Yields the successive tails (cdrs) oflist
, starting withlist
itself, which may be improper. -
index
start &optional stop &key incr fixnums?: Yields integers in the interval [start
,stop
) ifincr
(which defaults to 1) is positive; or in the interval [stop
,start
) ifincr
is negative. Specifically, in the upward case, the values begin withstart
and increase byincr
until >=stop
; in the downward case, the values begin withstart
-incr
and decrease byincr
until <stop
. All values are assumed to be fixnums unlessfixnums?
is a literalnil
.stop
can be omitted or a literalnil
to indicate an unbounded sequence.start
can be omitted to start at 0. -
index-inc
start stop &key incr fixnums?: ("Index, INClusive") Yields integers in the interval [start
,stop
]. Specifically, in the upward case (incr
> 0), the values begin withstart
and increase byincr
until >stop
; in the downward case, the values begin withstart
and decrease byincr
until <stop
. All values are assumed to be fixnums unlessfixnums?
is a literalnil
.stop
can be a literalnil
to indicate an unbounded sequence. -
exp
initial-value base: Yields an exponential sequence whose first value is initial-value, and whose value is multiplied by base on each iteration. -
vector
vec &key start stop incr: Yields elements of vectorvec
.start
andstop
may be supplied to select a subsequence ofvec
;incr
may be supplied (it must be positive) to select every second element etc. For performance, you may prefersimple-vector
. -
simple-vector
vec &key start stop incr: Yields elements of vectorvec
, which is assumed to be simple, and whose size is assumed to be a fixnum.start
andstop
may be supplied to select a subsequence ofvec
;incr
may be supplied (it must be positive) to select every second element etc. -
string
str &key start stop incr: Yields elements of stringstr
.start
andstop
may be supplied to select a subsequence ofvec
;incr
may be supplied (it must be positive) to select every second element etc. For performance, you may prefersimple-string
. -
simple-string
str &key start stop incr: Yields elements of stringstr
, which is assumed to be simple, and whose size is assumed to be a fixnum.start
andstop
may be supplied to select a subsequence ofstr
;incr
may be supplied (it must be positive) to select every second element etc.
2.2. Result types
GMap, unlike mapcar
, has the ability to perform arbitrary reductions on the
results returned by the function being mapped. So, cases where you might have
written
(reduce #'+ (mapcar ...))
can be replaced with a single gmap
call, which is also more efficient in that
it doesn't materialize the intermediate result list:
(gmap (:result sum) ...)
GMap takes the view that consing up a collection of function results is a kind of reduction — a slightly unusual view, perhaps, but not unreasonable. So it treats collecting the results and summing them, for example, as instances of the same pattern.
As with the argument types, the set of result types is extensible. For details
of how to do this, see def-result-type
in the source file, and study the
existing definitions.
A result type can explicitly indicate that it expects the function being mapped
to return multiple values, which it can turn into multiple arguments to a
reduction function. Also, there is the special result type values
, which
takes two or more result specs, and expects the function being mapped to return
the same number of values; it reduces each value according to the corresponding
result spec, then finally returns all the reduction results as multiple values.
For example:
(gmap (:result values list sum) #'(lambda (x y) (values x y)) (:arg alist '((a . 7) (b . 19)))) ==> (A B) ; first value 26 ; second value
Additionally, there is a :consume
feature that allows a single reduction to
consume multiple values from the function being mapped; see the source for
details. — Earlier, I promised a "better way" (search for that phrase) to
handle cases where you need cross-iteration data flow. The use of multiple
values and :consume
can solve many of these problems without dirtying your
code with side effects.
2.2.1. Predefined result types
-
list
&key filterp: Returns a list of the values, optionally filtered byfilterp
. -
alist
&key filterp: Consumes two values from the mapped function; returns an alist of the pairs. Note thatfilterp
, if supplied, must take two arguments. -
plist
&key filterp: Consumes two values from the mapped function; returns a plist of the pairs. Note thatfilterp
, if supplied, must take two arguments. -
append
&key filterp: Returns the result ofappend
ing the values, optionally filtered byfilterp
. -
nconc
&key filterp: Returns the result ofnconc
ing the values, optionally filtered byfilterp
. -
and
: If one of the values is false, terminates the iteration and returns false; otherwise returns the last value. Does not work as an operand ofvalues
. (Generalizescl:every
.) -
or
: If one of the values is true, terminates the iteration and returns it; otherwise, returns false. Does not work as an operand of:values
. (Generalizescl:some
.) -
sum
&key filterp: Returns the sum of the values, optionally filtered byfilterp
. -
count-if
: Returns the number of true values. -
max
&key filterp: Returns the maximum of the values, optionally filtered byfilterp
; ornil
if there are none. -
min
&key filterp: Returns the minimum of the values, optionally filtered byfilterp
; ornil
if there are none. -
vector
&key use-vector length fill-pointer adjustable filterp: Constructs a vector containing the results. Ifuse-vector
is supplied, the argument will be filled with the results and returned; iffill-pointer
is true andadjustable
is true, it must have a fill pointer and be adjustable, and values will be appended to it withvector-push-extend
; iffill-pointer
is true andadjustable
is false, it must have a fill pointer, and values will be appended to it withvector-push
; otherwise, the vector is assumed to be simple and must be large enough to hold the results. (Recall thatvector-push
has no effect if the vector is full.)If
use-vector
is not supplied, a vector will be constructed and returned; iflength
is supplied, returns a simple vector of the specified length (which must be sufficient to hold the results); otherwise, returns a simple vector of the correct length (but to do this, it must cons a temporary list).In any case, if
filterp
is supplied, it is a predicate of one argument, the value of the function being mapped, that says whether to include it in the result. -
string
&key use-string length fill-pointer adjustable filterp: Constructs a string containing the results. Ifuse-string
is supplied, the argument will be filled with the results and returned; iffill-pointer
is true andadjustable
is true, it must have a fill pointer and be adjustable, and values will be appended to it withvector-push-extend
; iffill-pointer
is true andadjustable
is false, it must have a fill pointer, and values will be appended to it withvector-push
; otherwise, the vector is assumed to be simple and must be large enough to hold the results. (Recall thatvector-push
has no effect if the vector is full.)If
use-string
is not supplied, a string will be constructed and returned; iflength
is supplied, returns a simple string of the specified length (which must be sufficient to hold the results); otherwise, returns a simple string of the correct length (but to do this, it must cons a temporary list).In any case, if
filterp
is supplied, it is a predicate of one argument, the value of the function being mapped, that says whether to include it in the result.
2.3. The Old Syntax
For most of GMap's existence, it has had a slightly different syntax from that
shown above. The :arg
and :result
keywords were not used; instead, the
argument and result types were defined as keywords themselves. For instance,
the first example above would look like this:
(gmap (:list) #'foo (:list this-list) (:list that-list))
The reason I made them keywords was so that uses of them, as in this example, wouldn't look like function calls; at least, the presence of the colons would presumably give the reader of the code, who might not be familiar with GMap, a clue that something out of the ordinary was going on. The problem with this, of course, is that it abandoned the modularity which is the entire point of the package system: there can be only one definition of a given keyword as an argument type or as a result type.
The 4.0 release fixes this by introducing :arg
and :result
to visually mark
these syntactic elements, and by changing all the predefined types to use
non-keyword symbols exported either from cl:
or from gmap:
. However, it's
important not to break existing code; so here's what I have done.
def-gmap-arg-type
and def-gmap-res-type
, if given a name that is not a
keyword symbol, now also define the keyword symbol with the same name; but
before they do that, they check that it is not already defined by a previous
call supplying a different non-keyword name, and if so, they signal an error.
With this change, the old syntax will continue to work, but collisions, where two systems try to define argument or result types with the same symbol-name, will be detected; previously, the one loaded last would "win".
3. Macro fn
For very small lambda expressions, the six characters taken by the word lambda
can be a significant fraction of the total. If the body doesn't reference all
the parameters, moreover, you'll want to add a (declare (ignore ...))
for the
unused ones, which adds to the verbosity. The fn
macro helps with both
annoyances. Obviously, its name is quite short. (Those willing to set foot on
the slippery slope of Unicode could use λ
, of course, but I've been afraid to
go there yet, lest my code wind up as an unintelligible mass of obscure
mathematical symbols and cat emojis.) And, you can use either a bare _
or a
name beginning with _
as a parameter name, to indicate an ignored parameter;
fn
automatically inserts the required ignore
declaration.
One catch, though, is that if you inadvertently write #'(fn ...)
, you will get
a probably-unintelligible compiler error. Just delete the #'
. (Lisp Machine
Lisp had something called a "lambda macro" that solved this problem, but the
feature didn't make it into Common Lisp.)
4. Lexical contexts
I still consider these experimental and probably not terribly useful, though I do
use one occasionally. If curiosity gets the better of you, have a look at
contexts.text
. Briefly, it's an alternate syntax for combinations of let
and labels
.
5. Global lexical variables
Macro deflex
var &optional val doc:
Declares var
as a global lexical variable, and if val
is supplied and
var
is not already bound, initializes it to val
. doc
, if supplied,
is taken as a documentation string. In some implementations (e.g. Scieneer),
locally rebinding the same name is not permitted; in most, it is permitted
but creates a new lexical variable, with no effect on the global one.
Macro deflex-reinit
is the same except that it reinitializes the variable if
it's already bound, like cl:defparameter
.
6. Interactive setq
Macro isetq
&rest var-val-pairs:
Some implementations, notably SBCL, issue a warning if you use setq
to set a
new global variable in the REPL. (I guess they want you to use defvar
first.)
isetq
prevents this warning, using the same trick deflex
uses to declare a
global lexical. isetq
is not recommended for use in programs.
7. "Reversed" function binding forms
It's not uncommon to use labels
to define a set of several mutually-recursive
functions, whose code can fill a screen or two, then have the body of the
labels
kick off the recursion with a one-liner that just calls one of the
functions. This strikes me as a little hard to read — you have to scroll all
the way to the bottom to find the initial values of the recursion variables —
and also a little wasteful of indentation space. So I introduced a macro
rlabels
, which puts the initial call first as a single subform, then takes the
function-binding clauses as its &body
parameter, saving 7 columns of
indentation.
There are also rflet
and rmacrolet
, but I don't think I've ever used them.