cl-ppcre

API Reference

cl-ppcre

Perl-compatible regular expression library

CL-PPCRE

  • Variable *REGEX-CHAR-CODE-LIMIT*
    char-code-limit
    The upper exclusive bound on the char-codes of characters which can occur in character classes. Change this value BEFORE creating scanners if you don't need the (full) Unicode support of implementations like AllegroCL, CLISP, LispWorks, or SBCL.
  • Variable *USE-BMH-MATCHERS*
    nil
    Whether the scanners created by CREATE-SCANNER should use the (fast but large) Boyer-Moore-Horspool matchers.
  • Variable *OPTIMIZE-CHAR-CLASSES*
    nil
    Whether character classes should be compiled into look-ups into O(1) data structures. This is usually fast but will be costly in terms of scanner creation time and might be costly in terms of size if *REGEX-CHAR-CODE-LIMIT* is high. This value will be used as the :KIND keyword argument to CREATE-OPTIMIZED-TEST-FUNCTION - see there for the possible non-NIL values.
  • Variable *PROPERTY-RESOLVER*
    nil
    Should be NIL or a designator for a function which accepts strings and returns unary character test functions or NIL. This 'resolver' is intended to handle `character properties' like \p{IsAlpha}. If *PROPERTY-RESOLVER* is NIL, then the parser will simply treat \p and \P as #\p and #\P as in older versions of CL-PPCRE.
  • Variable *ALLOW-QUOTING*
    nil
    Whether the parser should support Perl's \Q and \E.
  • Variable *ALLOW-NAMED-REGISTERS*
    nil
    Whether the parser should support AllegroCL's named registers (?<name>"<regex>") and back-reference \k<name> syntax.
  • Condition PPCRE-ERROR  (SIMPLE-ERROR)
    All errors signaled by CL-PPCRE are of this type.
  • Condition PPCRE-SYNTAX-ERROR  (PPCRE-ERROR)
    Signaled if CL-PPCRE's parser encounters an error when trying to parse a regex string or to convert a parse tree into its internal representation.
  • Condition PPCRE-INVOCATION-ERROR  (PPCRE-ERROR)
    Signaled when CL-PPCRE functions are invoked with wrong arguments.
  • Function CREATE-OPTIMIZED-TEST-FUNCTION (test-function &key (start 0) (end *regex-char-code-limit*) (kind *optimize-char-classes*))
    Given a unary test function which is applicable to characters returns a function which yields the same boolean results for all characters with character codes from START to (excluding) END. If KIND is NIL, TEST-FUNCTION will simply be returned. Otherwise, KIND should be one of: * :HASH-TABLE - builds a hash table representing all characters which satisfy the test and returns a closure which checks if a character is in that hash table * :CHARSET - instead of a hash table uses a "charset" which is a data structure using non-linear hashing and optimized to represent (sparse) sets of characters in a fast and space-efficient way (contributed by Nikodemus Siivola) * :CHARMAP - instead of a hash table uses a bit vector to represent the set of characters You can also use :HASH-TABLE* or :CHARSET* which are like :HASH-TABLE and :CHARSET but use the complement of the set if the set contains more than half of all characters between START and END. This saves space but needs an additional pass across all characters to create the data structure. There is no corresponding :CHARMAP* kind as the bit vectors are already created to cover the smallest possible interval which contains either the set or its complement.
  • Function PARSE-STRING (string)
    Translate the regex string STRING into a parse tree.
  • Generic-Function CREATE-SCANNER (regex &key case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive)
    Accepts a regular expression - either as a parse-tree or as a string - and returns a scan closure which will scan strings for this regular expression and a list mapping registers to their names (NIL stands for unnamed ones). The "mode" keyword arguments are equivalent to the imsx modifiers in Perl. If DESTRUCTIVE is not NIL, the function is allowed to destructively modify its first argument (but only if it's a parse tree).
  • Method CREATE-SCANNER ((regex-string string) &key case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive)
  • Method CREATE-SCANNER ((parse-tree t) &key case-insensitive-mode multi-line-mode single-line-mode extended-mode destructive)
  • Generic-Function SCAN (regex target-string &key start end real-start-pos)
    Searches TARGET-STRING from START to END and tries to match REGEX. On success returns four values - the start of the match, the end of the match, and two arrays denoting the beginnings and ends of register matches. On failure returns NIL. REGEX can be a string which will be parsed according to Perl syntax, a parse tree, or a pre-compiled scanner created by CREATE-SCANNER. TARGET-STRING will be coerced to a simple string if it isn't one already. The REAL-START-POS parameter should be ignored - it exists only for internal purposes.
  • Method SCAN ((regex-string string) target-string &key (start 0) (end (length target-string)) ((real-start-pos *real-start-pos*) nil))
  • Method SCAN ((scanner function) target-string &key (start 0) (end (length target-string)) ((real-start-pos *real-start-pos*) nil))
  • Method SCAN ((parse-tree t) target-string &key (start 0) (end (length target-string)) ((real-start-pos *real-start-pos*) nil))
  • Function SCAN-TO-STRINGS (regex target-string &key (start 0) (end (length target-string)) sharedp)
    Like SCAN but returns substrings of TARGET-STRING instead of positions, i.e. this function returns two values on success: the whole match as a string plus an array of substrings (or NILs) corresponding to the matched registers. If SHAREDP is true, the substrings may share structure with TARGET-STRING.
  • Macro REGISTER-GROUPS-BIND (var-list (regex target-string &key start end sharedp) &body body)
    Executes BODY with the variables in VAR-LIST bound to the corresponding register groups after TARGET-STRING has been matched against REGEX, i.e. each variable is either bound to a string or to NIL. If there is no match, BODY is _not_ executed. For each element of VAR-LIST which is NIL there's no binding to the corresponding register group. The number of variables in VAR-LIST must not be greater than the number of register groups. If SHAREDP is true, the substrings may share structure with TARGET-STRING.
  • Macro DO-SCANS ((match-start match-end reg-starts reg-ends regex target-string &optional result-form &key start end) &body body &environment env)
    Iterates over TARGET-STRING and tries to match REGEX as often as possible evaluating BODY with MATCH-START, MATCH-END, REG-STARTS, and REG-ENDS bound to the four return values of each match in turn. After the last match, returns RESULT-FORM if provided or NIL otherwise. An implicit block named NIL surrounds DO-SCANS; RETURN may be used to terminate the loop immediately. If REGEX matches an empty string the scan is continued one position behind this match. BODY may start with declarations.
  • Macro DO-MATCHES ((match-start match-end regex target-string &optional result-form &key start end) &body body)
    Iterates over TARGET-STRING and tries to match REGEX as often as possible evaluating BODY with MATCH-START and MATCH-END bound to the start/end positions of each match in turn. After the last match, returns RESULT-FORM if provided or NIL otherwise. An implicit block named NIL surrounds DO-MATCHES; RETURN may be used to terminate the loop immediately. If REGEX matches an empty string the scan is continued one position behind this match. BODY may start with declarations.
  • Macro DO-MATCHES-AS-STRINGS ((match-var regex target-string &optional result-form &key start end sharedp) &body body)
    Iterates over TARGET-STRING and tries to match REGEX as often as possible evaluating BODY with MATCH-VAR bound to the substring of TARGET-STRING corresponding to each match in turn. After the last match, returns RESULT-FORM if provided or NIL otherwise. An implicit block named NIL surrounds DO-MATCHES-AS-STRINGS; RETURN may be used to terminate the loop immediately. If REGEX matches an empty string the scan is continued one position behind this match. If SHAREDP is true, the substrings may share structure with TARGET-STRING. BODY may start with declarations.
  • Macro DO-REGISTER-GROUPS (var-list (regex target-string &optional result-form &key start end sharedp) &body body)
    Iterates over TARGET-STRING and tries to match REGEX as often as possible evaluating BODY with the variables in VAR-LIST bound to the corresponding register groups for each match in turn, i.e. each variable is either bound to a string or to NIL. For each element of VAR-LIST which is NIL there's no binding to the corresponding register group. The number of variables in VAR-LIST must not be greater than the number of register groups. After the last match, returns RESULT-FORM if provided or NIL otherwise. An implicit block named NIL surrounds DO-REGISTER-GROUPS; RETURN may be used to terminate the loop immediately. If REGEX matches an empty string the scan is continued one position behind this match. If SHAREDP is true, the substrings may share structure with TARGET-STRING. BODY may start with declarations.
  • Function ALL-MATCHES (regex target-string &key (start 0) (end (length target-string)))
    Returns a list containing the start and end positions of all matches of REGEX against TARGET-STRING, i.e. if there are N matches the list contains (* 2 N) elements. If REGEX matches an empty string the scan is continued one position behind this match.
  • Function ALL-MATCHES-AS-STRINGS (regex target-string &key (start 0) (end (length target-string)) sharedp)
    Returns a list containing all substrings of TARGET-STRING which match REGEX. If REGEX matches an empty string the scan is continued one position behind this match. If SHAREDP is true, the substrings may share structure with TARGET-STRING.
  • Function SPLIT (regex target-string &key (start 0) (end (length target-string)) limit with-registers-p omit-unmatched-p sharedp)
    Matches REGEX against TARGET-STRING as often as possible and returns a list of the substrings between the matches. If WITH-REGISTERS-P is true, substrings corresponding to matched registers are inserted into the list as well. If OMIT-UNMATCHED-P is true, unmatched registers will simply be left out, otherwise they will show up as NIL. LIMIT limits the number of elements returned - registers aren't counted. If LIMIT is NIL (or 0 which is equivalent), trailing empty strings are removed from the result list. If REGEX matches an empty string the scan is continued one position behind this match. If SHAREDP is true, the substrings may share structure with TARGET-STRING.
  • Function REGEX-REPLACE (regex target-string replacement &key (start 0) (end (length target-string)) preserve-case simple-calls (element-type 'character))
    Try to match TARGET-STRING between START and END against REGEX and replace the first match with REPLACEMENT. Two values are returned; the modified string, and T if REGEX matched or NIL otherwise. REPLACEMENT can be a string which may contain the special substrings "\&" for the whole match, "\`" for the part of TARGET-STRING before the match, "\'" for the part of TARGET-STRING after the match, "\N" or "\{N}" for the Nth register where N is a positive integer. REPLACEMENT can also be a function designator in which case the match will be replaced with the result of calling the function designated by REPLACEMENT with the arguments TARGET-STRING, START, END, MATCH-START, MATCH-END, REG-STARTS, and REG-ENDS. (REG-STARTS and REG-ENDS are arrays holding the start and end positions of matched registers or NIL - the meaning of the other arguments should be obvious.) Finally, REPLACEMENT can be a list where each element is a string, one of the symbols :MATCH, :BEFORE-MATCH, or :AFTER-MATCH - corresponding to "\&", "\`", and "\'" above -, an integer N - representing register (1+ N) -, or a function designator. If PRESERVE-CASE is true, the replacement will try to preserve the case (all upper case, all lower case, or capitalized) of the match. The result will always be a fresh string, even if REGEX doesn't match. ELEMENT-TYPE is the element type of the resulting string.
  • Function REGEX-REPLACE-ALL (regex target-string replacement &key (start 0) (end (length target-string)) preserve-case simple-calls (element-type 'character))
    Try to match TARGET-STRING between START and END against REGEX and replace all matches with REPLACEMENT. Two values are returned; the modified string, and T if REGEX matched or NIL otherwise. REPLACEMENT can be a string which may contain the special substrings "\&" for the whole match, "\`" for the part of TARGET-STRING before the match, "\'" for the part of TARGET-STRING after the match, "\N" or "\{N}" for the Nth register where N is a positive integer. REPLACEMENT can also be a function designator in which case the match will be replaced with the result of calling the function designated by REPLACEMENT with the arguments TARGET-STRING, START, END, MATCH-START, MATCH-END, REG-STARTS, and REG-ENDS. (REG-STARTS and REG-ENDS are arrays holding the start and end positions of matched registers or NIL - the meaning of the other arguments should be obvious.) Finally, REPLACEMENT can be a list where each element is a string, one of the symbols :MATCH, :BEFORE-MATCH, or :AFTER-MATCH - corresponding to "\&", "\`", and "\'" above -, an integer N - representing register (1+ N) -, or a function designator. If PRESERVE-CASE is true, the replacement will try to preserve the case (all upper case, all lower case, or capitalized) of the match. The result will always be a fresh string, even if REGEX doesn't match. ELEMENT-TYPE is the element type of the resulting string.
  • Function REGEX-APROPOS-LIST (regex &optional packages &key (case-insensitive t))
  • Function REGEX-APROPOS (regex &optional packages &key (case-insensitive t))
    Similar to the standard function APROPOS but returns a list of all symbols which match the regular expression REGEX. If CASE-INSENSITIVE is true and REGEX isn't already a scanner, a case-insensitive scanner is used.
  • Function QUOTE-META-CHARS (string &key (start 0) (end (length string)))
    Quote, i.e. prefix with #\\, all non-word characters in STRING.
  • Function PARSE-TREE-SYNONYM (symbol)
    Returns the parse tree the SYMBOL symbol is a synonym for. Returns NIL is SYMBOL wasn't yet defined to be a synonym.
  • Function (setf PARSE-TREE-SYNONYM) (new-parse-tree symbol)
    Defines SYMBOL to be a synonm for the parse tree NEW-PARSE-TREE.
  • Macro DEFINE-PARSE-TREE-SYNONYM (name parse-tree)
    Defines the symbol NAME to be a synonym for the parse tree PARSE-TREE. Both arguments are quoted.

cl-ppcre-unicode

Perl-compatible regular expression library (Unicode)

CL-PPCRE-UNICODE

  • Function UNICODE-PROPERTY-RESOLVER (property-name)
    A property resolver which understands Unicode properties using CL-UNICODE's PROPERTY-TEST function. This resolver is automatically installed in *PROPERTY-RESOLVER* when the CL-PPCRE-UNICODE system is loaded.