dartsclemailaddress

2016-04-21

Email Address Parsing

This library provides a fully RFC 5322 compliant parser for email addresses. Also provided are a few tiny helper functions, which allow the formatting of email addresses in ways, which ensures, that they are RFC 5322 compliant.

This library has been tested under

  • SBCL
  • Clozure Common Lisp
  • LispWorks
  • ABCL

Package DARTS.LIB.EMAIL-ADDRESS

  • Variable: *allow-unicode*

If true, the parser functions accept arbitrary characters (with char-code > 127) in addition to what they accept otherwise. This affects the productions of ctext, atext, qtext, and dtext. In other words: something like

> `D?siree ??eldahl <d.??eldahl@secret-?skulap.com>`

becomes a valid email address. All base parser functions take a :allow-unicode keyword argument, whose default value is the value of this variable.

  • Variable: *allow-obsolete-syntax*

If true, enable support for a few of the obs-... productions in the RFC. This is disabled by default. Right now, enabling this option makes

> `R. L. Stephenson <r.l.stephenson@literature-and-coffee.cookies>`

a well-formed mailbox spec. Without this option enabled, the address must be written as (e.g.)

> `"R. L. Stephenson" <r.l.stephenson@literature-and-coffee.cookies>`
  • Function: parse-rfc5322-addr-spec string &key start end allow-unicode allow-trailing-junklocal-part domain error position

Parse string (or a subequence of it) as an RFC 5322 addr-spec. If allow-unicode, characters outside of the ASCII range (i.e., with codes > 127) are allowed virtually anywhere. See *ALLOW-UNICODE* for details, whose value also is the default for this argument.

The values of start and end are bounding index designators for the part of string to work on.

If allow-trailing-garbage is false (the default), the parser function makes sure, that no unprocessed characters remain in the designated input region of string after a full address has successfully been parsed. If the value is true, this function does not check for unprocessed characters; the caller may inspect the returned position value to determine, whether the string was processed fully, or whether unprocessed characters remain.

This function returns four values:

  • local-part is the value of the address' local part. If parsing fails early enough, this value is nil.

  • domain is the value of the address' domain part. If parsing fails, before the domain is encountered, this value is nil.

  • error is a nil, if the string could be parsed successfully. Otherwise, it is a keyword symbol, which indicates, why the parser stopped.

  • position is an integer, which identifies the first character in string, which has not been processed by this function.

  • Function: parse-rfc5322-mailbox string &key start end allow-unicode allow-obsolete-syntax allow-trailing-junklocal-part domain display-name error

Parse string (or a subequence of it) as an RFC 5322 mailbox. If allow-unicode, characters outside of the ASCII range (i.e., with codes > 127) are allowed virtually anywhere. See *ALLOW-UNICODE* for details, whose value also is the default for this argument.

If allow-obsolete-syntax is false (the default), this function is very strict with respect to the accepted input. In particular, none of the obs- productions is recognized in any of the address components. By supplying a value of true for this argument, the parser becomes more lenient, accepting values, which have historically been accepted as well-formed addresses. See *ALLOW-OBSOLETE-SYNTAX* for details.

The values of start and end are bounding index designators for the part of string to work on.

If allow-trailing-garbage is false (the default), the parser function makes sure, that no unprocessed characters remain in the designated input region of string after a full address has successfully been parsed. If the value is true, this function does not check for unprocessed characters; the caller may inspect the returned position value to determine, whether the string was processed fully, or whether unprocessed characters remain.

This function returns five values:

  • local-part is the value of the address' local part. If parsing fails early enough, this value is nil.

  • domain is the value of the address' domain part. If parsing fails, before the domain is encountered, this value is nil.

  • display-name is the display name found, or nil, if the address did not contain a display name part.

  • error is a nil, if the string could be parsed successfully. Otherwise, it is a keyword symbol, which indicates, why the parser stopped.

  • position is an integer, which identifies the first character in string, which has not been processed by this function.

  • Function: parse-rfc5322-mailbox-list string &key start end allow-unicode allow-obsolete-syntaxlist error position

Parse string (or a subequence of it) as a comma separated list of RFC 5322 mailbox specifications. If allow-unicode, characters outside of the ASCII range (i.e., with codes > 127) are allowed virtually anywhere. See *ALLOW-UNICODE* for details, whose value also is the default for this argument.

If allow-obsolete-syntax is false (the default), this function is very strict with respect to the accepted input. In particular, none of the obs- productions is recognized in any of the address components. By supplying a value of true for this argument, the parser becomes more lenient, accepting values, which have historically been accepted as well-formed addresses. See *ALLOW-OBSOLETE-SYNTAX* for details.

The values of start and end are bounding index designators for the part of string to work on.

If allow-trailing-garbage is false (the default), the parser function makes sure, that no unprocessed characters remain in the designated input region of string after a full address has successfully been parsed. If the value is true, this function does not check for unprocessed characters; the caller may inspect the returned position value to determine, whether the string was processed fully, or whether unprocessed characters remain.

This function returns three values:

  • list is a list of sub-lists of the form (local-part domain display-name), one sublist for each successfully parsed mailbox specification in the input string. The elements appear in the order, they are found in the input.

  • error is a nil, if the string could be parsed successfully. Otherwise, it is a keyword symbol, which indicates, why the parser stopped.

  • position is an integer, which identifies the first character in string, which has not been processed by this function.

  • Function: escape-local-part string &key start endresult

Ensures, that string is properly escaped for use as the local part of an email address. If necessary, this function adds quotes and backslashes. Note, that non-ASCII characters with codes > 127 are not special cased by this function, i.e., they are implicitly allowed.

The values of start and end are bounding index designators for the part of string to work on.

  • Function: escape-display-name string &key start endresult

Ensures, that string is properly escaped for use as the display name of a mailbox. If necessary, this function adds quotes and backslashes. Note, that non-ASCII characters with codes > 127 are not special cased by this function, i.e., they are implicitly allowed.

The values of start and end are bounding index designators for the part of string to work on.

  • Structure: address

Instances of this structure represent email addresses. Basically, an address is a pair of "local part" and "domain". After construction, address instances are immutable.

This library defines a total ordering over all addresses, which is derived from the lexicographic orderings of the components. When comparing for order (i.e., using address<, address<=, address>= or address>) the domain part is always compared first. If ambigous (i.e., if both address instances have equal domains), the local parts are compared.

Regardless of whether the comparison is for order or for equality, the domain parts are always compared disregarding the letter case, and the local parts are always compared case-sensitively.

  • Function: address objectaddress

Tries to coerce its argument object into an instance of structure class address, according to the following rules:

- if _object_ is already an instance of `address`, it is returned directly

- if _object_ is a string, it is parsed according to the RFC `mailbox`
  production, and the results are used to construct a new `address`. If a
  display name part is present in _object_, it will be ignored.

If this function cannot convert its argument into an address, it signals an error of type type-error.

  • Function: address-local-part objectstring

Answers the string, which is the local part of email address object

  • Function: address-domain objectstring

Answers the string, which is the domain part of email address object

  • Function: address-string objectstring

Answers the fully escaped string representation of email address object. The string returned by this function may be parsed back into an address instance (e.g. by calling the address function), and the resulting address instance should be equivalent with object under address=.

  • Function: address-hash objectresult

Answers a hash code for address instance object

  • Function: address= address1 address2result

Compares the addresses address1 and address2, and answers true, if both represent the same email address, and false otherwise.

  • Function: address/= address1 address2result

Compares the addresses address1 and address2, and answers true, if both represent different email addresses, and false otherwise.

  • Function: address< address1 address2result

Compares the addresses address1 and address2, and answers true, if address1 is strictly less than address2. See the description of structure class address for details about address ordering.

  • Function: address<= address1 address2result

Compares the addresses address1 and address2, and answers true, if address1 is less than or equal to address2. See the description of structure class address for details about address ordering.

  • Function: address>= address1 address2result

Compares the addresses address1 and address2, and answers true, if address1 is greater than or equal to address2. See the description of structure class address for details about address ordering.

  • Function: address> address1 address2result

Compares the addresses address1 and address2, and answers true, if address1 is strictly greater than address2. See the description of structure class address for details about address ordering.

  • Class: mailbox

A mailbox is basically an address combined with a display name. This class itself does not actually provide anything interesting. It merely exists for the purpose of type discrimination.

  • Class: basic-mailbox

This is a concrete implementation of the mailbox protocol. Instances have two slots mailbox-address and mailbox-display-name.

  • Function: mailbox objectmailbox

Tries to coerce its argument object into an instance of class mailbox, according to the following rules:

- if _object_ is already an instance of `mailbox`, it is returned directly

- if _object_ is an `address`, a new `basic-mailbox` is created, whose address
  part is `object`, and whose display name is `nil`.

- if _object_ is a string, it is parsed according to the RFC `mailbox`
  production, and the results are used to construct a new `basic-mailbox`.

If this function cannot convert its argument into a mailbox, it signals an error of type type-error.

  • Generic Function: mailboxp objectresult

Tests, whether object fulfills the mailbox protocol. This condition is always true by definition for subclasses of class mailbox. It may additionally be true for other objects.

  • Generic Function: mailbox-address objectaddress

Answers the address instance, which describes the actual email address associated with mailbox object. This method is part of the core mailbox protocol, and must be implemented by all objects, which want to participate in that protocol.

  • Generic Function: mailbox-display-name objectresult

Answers the display name associated with the given mailbox instance object. This function is part of the core mailbox protocol and must be implemented by all objects, which want to participate in that protocol.

  • Generic Function: mailbox-local-part objectresult

Answers the local part string of this mailbox's address. The default method simply extracts the address-local-part from the object returned by mailbox-address when applied to the given object.

  • Generic Function: mailbox-domain objectresult

Answers the domain string of this mailbox's address. The default method simply extracts the address-domain from the object returned by mailbox-address when applied to the given object.

  • Generic Function: mailbox-string objectresult

Constructs a string representation of the given mailbox instance. The result is required to be a well-formed RFC 5322 email address parsable using the mailbox production. The default method should be usable by almost all concrete mailbox implementations.

Author
Dirk Esser
Maintainer
Dirk E?er
License
MIT