Quicklisp Build Status Circle CI Coverage Status

Encoding/end-of-line detection and external-format abstraction for Common Lisp.

The Library is a sphere whose exact center is any one of its hexagons and whose circumference is inaccessible. -- "The Library of Babel" by Jorge Luis Borges


  • encoding/end-of-line name abstraction
  • encoding/end-of-line detection
  • external-format abstraction
  • make external-format for each implementations
  • make external-format from byte-array, stream and pathname (with auto-detection)
  • abstract external-format of babel and flexi-stream
  • many implementations (installable with CIM) support
  • Embeddable Common Lisp
  • Steel Bank Common Lisp
  • Clozure CL
  • Armed Bear Common Lisp


Get and install via quicklisp:

CL-USER> (ql:quickload :inquisitor)


Encoding detection

To detect encoding from stream, use (inq:detect-encoding stream scheme). This returns implementation independent encoding name. About scheme, see Encoding scheme.

for example:

CL-USER> (with-open-file (in #P"t/data/unicode/utf-8.txt"
                          :direction :input
                          :element-type '(unsigned-byte 8))
           (inq:detect-encoding in :jp))

You can see the list of available encodings:

CL-USER> inq:+available-encodings+
(:UTF-8 :UCS-2LE :UCS-2BE :UTF-16 :ISO-2022-JP :EUC-JP :CP932 :BIG5 :ISO-2022-TW
 :GB2312 :GB18030 :ISO-2022-CN :EUC-KR :JOHAB :ISO-2022-KR :ISO-8859-6 :CP1256
 :ISO-8859-7 :CP1253 :ISO-8859-8 :CP1255 :ISO-8859-9 :CP1254 :ISO-8859-5
 :KOI8-R :KOI8-U :CP866 :CP1251 :ISO-8859-2 :CP1250 :ISO-8859-13 :CP1257)

Encoding scheme

Encoding scheme is a hint to detect encoding.

It's mostly impossible to detect encoding universally, because there are two encoding such that use same byte sequences to represent other characters. So, limitting target encodings has benefit to encoding detection.

Here, in inquisitor, languages are used to limit the encodings. Where languages are, roughly speaking, writing systems used in anywhere arround the world. Fixing language is equivalent to fixing possible characters. Becaus of which, encoding detection be slightly eazy.

Supported scheme (languages) is as follows:

  • jp: japanese
  • tw: taiwanese
  • cn: chinese
  • kr: korean
  • ru: russian (latin-5)
  • ar: arabic (latin-6)
  • tr: turkish (latin-9)
  • gr: greek (latin-7)
  • hw: hebrew (latin-8)
  • pl: polish (latin-2)
  • bl: baltic (latin-7)

End-of-line type detection

If you want to know end-of-line (line break) type, use (inq:detect-end-of-line stream). This returns implementation independent end-of-line name.

CL-USER> (with-open-file (in "t/data/ascii/ascii-crlf.txt"
                             :direction :input
                             :element-type '(unsigned-byte 8))
           (inquisitor:detect-end-of-line in))


Implementation dependent/independent names

If you want to know implementation dependent name of encodings or eol type, use (inq:independent-name dependent-name). Returned value can be used as external-format, or its part.

CL-USER> (inq:independent-name :cp932)
:WINDOWS-CP932  ; on ECL
:WINDOWS-31J  ; on CCL
:|X-MS932_0213|  ; on ABCL

If you want to know implementation independent name of encodings or eol type, use (inq:dependent-name independent-name).


If you want to know eol is available on your implementation, use (inq:eol-available-p).

CL-USER> (inq:eol-available-p)
NIL  ; on SBCL

Make external-format

To make external-format from impl independent names, use (inq:make-external-format enc eol).

In SBCL and CCL, same code returns different value.


CL-USER> (let* ((file #P"t/data/ja/sjis.txt")
                (enc (inq:detect-encoding file :jp))
                (eol (inq:detect-end-of-line file)))
           (inq:make-external-format enc eol))


CL-USER> (let* ((file #P"t/data/ja/sjis.txt")
                (enc (inq:detect-encoding file :jp))
                (eol (inq:detect-end-of-line file)))
           (inq:make-external-format enc eol))

External-format detection

Inquisitor provides external-format detection method. It detects encoding and eol style, then make external-format from these. It can use with vector, byte stream and pathname.

Let's see examples with CCL.

From vector
CL-USER> (inq:detect-external-format
          (encode-string-to-octets "??????????????
#<EXTERNAL-FORMAT :UTF-8/:UNIX #x30200046719D>
From stream
CL-USER> (with-open-file (in "t/data/unicode/utf-8.txt"
                             :direction :input
                             :element-type '(unsigned-byte 8))
           (inq:detect-external-format in :jp))
#<EXTERNAL-FORMAT :UTF-8/:UNIX #x30200046719D>
From pathname
CL-USER> (inq:detect-external-format #P"t/data/unicode/utf-8.txt" :jp)
#<EXTERNAL-FORMAT :UTF-8/:UNIX #x30200046719D>


Copyright (c) 2000-2007 Shiro Kawai (
Copyright (c) 2007 Masayuki Onjo (
Copyright (c) 2011 zqwell (
Copyright (c) 2015 Shinichi Tanaka (


Licensed under the MIT License.

Shinichi Tanaka