clj-re
2022-11-07
Implements Clojure-styled regexp operations such as `re-matches` and `re-find`.
clj-re
- clojure style regular expression functions
This package wraps cl-ppcre's regexp handling, which is nearly identical to java.util.regex.Pattern which in turn is used by Clojure, in a series of regexp supporting functions that attempt to behave like their Clojure namesakes.
It provides the following functions:
#:re-find
#:re-groups
#:re-matcher
#:re-matches
#:re-pattern
#:re-quote-replacement ;clojure.string/re-quote-replacement
#:re-replace ;clojure.string/replace, distinct from clojure.core/replace
#:re-replace-first ;clojure.string/replace-first
#:re-seq
#:re-split ;clojure.string/split
Successfully tested on sbcl
and clisp
.
Differences from Clojure
clojure.string
namespace functions
Clojure has several functions which are normally in the clojure.string
namespace
that do not reside in separate Common Lisp packages (they're all in the :clj-re
).
Following are the clojure.string functions and what we have called them here:
clojure.string/replace
=>re-replace
('re' prefix added)clojure.string/replace-first
=>re-replace-first
('re' prefix added)clojure.string/re-quote-replacement
=>re-quote-replacement
(name unchanged)clojure.string/split
=>re-split
('re' prefix added)
We could have left replace-first
alone, but with every other exported symbol
in this package prefixed with 're', it seemed like the consistent thing to do.
Optional support for clojure pattern literal syntax, i.e. #"pattern"
Using :named-readtables
http://melisgl.github.io/named-readtables,
the :clj-re
pacakge exports readtable
and readtable-mixin
readtables
that will return compiled cl-ppcre patterns when the pattern literal syntax is
used.
In order to make the literal use optional, where clojure functions would require a Pattern
object, you are free to use a string expressing a pattern, which will in turn be fed to
re-pattern
to obtain a pattern. This is hopefully a superset-compatible feature
compare to clojure, but if you wanted clojure's exceptions on invalid arguments or types
you won't get them here.
You can enable pattern literal syntax either by:
(named-readtables:in-readtable clj-re:readtable)
which augments the standard readtable with a dispatch function for #""
literals, or by using named readtable composition capabilities (e.g. merge or
fuse) with clj-re:readtable-mixin
, which contains only the dispatch function
for pattern literals, without any other reader macros.
The literal syntax makes it easier to ensure you have compiled cl-ppcre scanners
when the code is compiled and/or loaded, roughly the equivalent of
#.(re-pattern "pattern string")
.
The literal syntax also means you don't need to double-escape regular
expression constructs such as \d
, which must be expressed as \\d
on a conventional
Common Lisp string.
TIP: If you're not using pattern literals,
remember princ
is your friend for debugging escape-related pitfalls.
For example, which regexp matches "\\\\"
with \{n}
notation?
`"\\{2}"`
or
`"\\\\{2}"`
Princ makes it clearer. The answer is the second, but once you get a big old string full of
\\\\
sequences readability goes into the toilet. Clojure's regular expression literal
goes a long way to making regular expressions more readable.
Named capturing groups (a.k.a. registers) are not supported.
Clojure/java has them, cl-ppcre has them, this was purely laziness on my part since I never use them.
Usage
(ql:quickload :clj-re)
(use-package :clj-re)
(re-find "a*b" "aaab") => "aaab"
or to test
(ql:quickload :clj-re-test)
(clj-re-test:run-tests)
See unit tests for more examples.
Adjusting for the previously mentioned caveats:
- Renaming of functions from the clojure.string namespace.
- Replacing regexp literal syntax (#"") with doubly-escaped string regexps.
- Replacement of vector results with list results.
- Missing support for named capture groups.
You will hopefully find this sufficient for casual Clojureish regexp needs. If you're going to do performance/memory critical stuff, I suggest you learn to use cl-ppcre directly because issues like string sharing and pattern compilation may be important for your app.