cl-duckdb

2024-10-12

CFFI wrapper around the DuckDB C API

Upstream URL

github.com/ak-coram/cl-duckdb

Author

Ákos Kiss <ak@coram.pub>

License

MIT License
README

1cl-duckdb

Build Status

Common Lisp CFFI wrapper around the DuckDB C API

1.1Dependencies

Currently the following Common Lisp implementations and operating systems are tested via CI. Android via Termux (ECL & SBCL) and some other BSDs are also known to work.

  • SBCL (Linux, FreeBSD, Windows, macOS, macOS on AArch64)
  • CCL (Linux, macOS)
  • ECL (Linux, FreeBSD, macOS, macOS on AArch64), see the ECL.org document for specifics

The following native libraries need to be installed in a location where CFFI can find them:

For example on Ubuntu or Debian (amd64):

  sudo apt-get install libffi-dev unzip
  # Download libduckdb-linux-amd64.zip from the C/C++ section of https://duckdb.org/docs/installation/
  sudo unzip ~/Downloads/libduckdb-linux-amd64.zip libduckdb.so -d /usr/lib/

1.2Installation

cl-duckdb can now be installed via Quicklisp from the Ultralisp distribution:

  ;; Install the ultralisp distribution if you don't have it already
  (ql-dist:install-dist "http://dist.ultralisp.org/" :prompt nil)
  ;; Load cl-duckdb
  (ql:quickload :duckdb)

1.3Packages

  • DUCKDB (nicknamed DDB): provides the high-level API.
  • DUCKDB-API: contains the low-level bindings to the DuckDB C API.

1.4Usage

1.4.1Connecting to a database

This library relies on the special variable DUCKDB:*CONNECTION* for a default database connection. Setting up a global default connection is recommmended for interactive REPL sessions:

  ;; Use an in-memory database as the default connection
  (ddb:initialize-default-connection)
  ;; Use a persistent database as the default connection
  (ddb:initialize-default-connection :path "my_database.ddb")
  ;; Clean up the default connection at the end of the session
  (duckdb:disconnect-default-connection)

For manual connection management most functions requiring a database connection also accept a connection object as a keyword argument (see DUCKDB:OPEN-DATABASE and DUCKDB:CONNECT for creating one).

To dynamically bind and automatically clean up a default connection, refer to DUCKDB:WITH-DEFAULT-CONNECTION and DUCKDB:WITH-TRANSIENT-CONNECTION instead.

1.4.2Basic example

  ;; Use an in-memory transient database
  (ddb:with-transient-connection
    ;; Create a new range table containing integers
    (ddb:run "CREATE TABLE range (i INTEGER PRIMARY KEY)"
             "CREATE SEQUENCE seq_range_i START 1")

    ;; Use a prepared statement to populate the table with a 1000 values
    (ddb:with-statement (statement "INSERT INTO range VALUES (nextval('seq_range_i'))")
      (dotimes (_ 1000) (ddb:perform statement)))

    ;; Solve Project Euler Problem 9
    (let* ((euler9-query (ddb:concat "SELECT a.i * b.i * c.i AS solution "
                                     "FROM range AS c "
                                     "JOIN range AS b ON b.i < c.i "
                                     "JOIN range AS a ON a.i < b.i "
                                     "WHERE a.i + b.i + c.i = ? "
                                     "AND a.i * a.i + b.i * b.i = c.i * c.i"))
           (parameters '(1000))
           (results (ddb:query euler9-query parameters)))
      (format t "PE9 Solution: ~a~%" (ddb:get-result results 'solution 0))))

1.4.3Interactive example: query remote Parquet data

The DUCKDB:Q (short for QUERY) and DUCKDB:FQ (short for FORMAT-QUERY) functions are provided as shorthands for interactive REPL use:

  (ddb:initialize-default-connection)
  (ddb:q "INSTALL httpfs") ; => (("Success" . #()))
  (let ((url "https://github.com/apache/parquet-mr/raw/master/parquet-hadoop/src/test/resources/test-file-with-no-column-indexes-1.parquet"))
    (ddb:fq "SELECT * FROM read_parquet(?) WHERE id < 10" url))
  ;; +----+------+--------------------------------+----------------------------------------+
  ;; | id | name | location                       | phoneNumbers                           |
  ;; +----+------+--------------------------------+----------------------------------------+
  ;; |  0 | p0   | NIL                            | ((phone ((number . 0) (kind . cell)))) |
  ;; |  1 | p1   | ((lon . 1.0d0) (lat . 2.0d0))  | ((phone ((number . 1) (kind . cell)))) |
  ;; |  2 | p2   | ((lon . 2.0d0) (lat))          | ((phone ((number . 2) (kind . cell)))) |
  ;; |  3 | p3   | NIL                            | ((phone ((number . 3) (kind . cell)))) |
  ;; |  4 | p4   | ((lon . 4.0d0) (lat . 8.0d0))  | ((phone ((number . 4) (kind . cell)))) |
  ;; |  5 | p5   | ((lon . 5.0d0) (lat))          | ((phone ((number . 5) (kind . cell)))) |
  ;; |  6 | p6   | NIL                            | ((phone ((number . 6) (kind . cell)))) |
  ;; |  7 | p7   | ((lon . 7.0d0) (lat . 14.0d0)) | ((phone ((number . 7) (kind . cell)))) |
  ;; |  8 | p8   | ((lon . 8.0d0) (lat))          | ((phone ((number . 8) (kind . cell)))) |
  ;; |  9 | p9   | NIL                            | ((phone ((number . 9) (kind . cell)))) |
  ;; +----+------+--------------------------------+----------------------------------------+
  ;; => NIL

1.4.4Sparks

There's some support for plotting query results directly in the REPL via cl-spark:

  (ddb:initialize-default-connection) ; => #<DUCKDB::CONNECTION {1014081EF3}>

  (ddb:bind-static-table
   'numbers `(("x" . (,(loop :for i :from 0d0 :by 0.2 :below pi :collect i)
                       :duckdb-double)))) ; => NIL

  (ddb:spark-query "SELECT x, sin(x) AS y, cos(x) AS z FROM numbers" nil '(x y z))
  ;; X ▁▁▁▂▂▃▃▄▄▅▅▆▆▇▇█
  ;; Y ▁▂▃▄▆▆▇▇█▇▇▆▅▄▃▁
  ;; Z █▇▇▇▆▆▅▅▄▃▃▂▁▁▁▁
  ;; => NIL

  (ddb:vspark-query "SELECT pow(2, x) AS y FROM numbers" nil nil 'y)
  ;; 1.0                    4.5                     8.0
  ;; ˫-----------------------+------------------------˧
  ;; ▏
  ;; █▏
  ;; ██▎
  ;; ███▋
  ;; █████▍
  ;; ███████▏
  ;; █████████▎
  ;; ███████████▋
  ;; ██████████████▌
  ;; █████████████████▊
  ;; █████████████████████▍
  ;; █████████████████████████▋
  ;; ██████████████████████████████▌
  ;; ████████████████████████████████████▎
  ;; ██████████████████████████████████████████▋
  ;; ██████████████████████████████████████████████████
  ;; => NIL

  (ddb:vspark-query "SELECT round(x, 2)::text AS x, sqrt(x) AS y FROM numbers" nil
                    'x 'y)
  ;;     0.0            0.8660254             1.7320508
  ;;     ˫---------------------+----------------------˧
  ;; 0.0 ▏
  ;; 0.2 ███████████▉
  ;; 0.4 ████████████████▊
  ;; 0.6 ████████████████████▋
  ;; 0.8 ███████████████████████▊
  ;; 1.0 ██████████████████████████▌
  ;; 1.2 █████████████████████████████▏
  ;; 1.4 ███████████████████████████████▍
  ;; 1.6 █████████████████████████████████▋
  ;; 1.8 ███████████████████████████████████▋
  ;; 2.0 █████████████████████████████████████▌
  ;; 2.2 ███████████████████████████████████████▍
  ;; 2.4 █████████████████████████████████████████▎
  ;; 2.6 ██████████████████████████████████████████▊
  ;; 2.8 ████████████████████████████████████████████▌
  ;; 3.0 ██████████████████████████████████████████████
  ;; => NIL

1.4.5Writing queries via SxQL

If you want to use a syntax based on s-expressions for your queries, then the SxQL library is an option:

  (ddb:initialize-default-connection)

  ;; Load SxQL
  (ql:quickload :sxql)
  (use-package :sxql)

  ;; Create a table
  (ddb:run (yield (create-table :numbers
                    ((i :type 'integer
                        :primary-key t)))))

  ;; Define utility function
  (defun query-sxql (q) (multiple-value-call #'ddb:query (yield q)))

  ;; Populate table with values
  (loop :for x :below 100
        :do (query-sxql (insert-into :numbers (set= :i x))))

  (query-sxql (select ((:as (:sum :i) :sum))
                (from :numbers)
                (where (:even :i))))
  ;; => (("sum" . #(4950)))

Please refer to the SxQL documentation for more examples.

1.4.6Appenders

Appenders are one of the ways of loading bulk data into DuckDB. They append rows to a single table of a database:

  (ddb:initialize-default-connection) ; => #<DUCKDB::CONNECTION {100B1088F3}>

  (ddb:run "CREATE TABLE roman_numerals (i INTEGER, value TEXT)") ; => NIL
  (ddb:with-appender (appender "roman_numerals")
    (loop :for i :from 1 :below 4999
          :do (ddb:append-row appender (list i (format nil "~:@R" i))))) ; => NIL
  (ddb:get-result (ddb:query "SELECT * FROM roman_numerals WHERE i = 1848" nil)
                  'value 0) ; => "MDCCCXXXXVIII"

1.4.7Querying Lisp vectors and lists as table columns

Currently only the following types are supported (the values are currently copied into DuckDB data chunks internally). Using a combination of vectors and list for different columns is possible, but each column should have the same length. Tables using Lisp data structures are not bound to a single connection and work across different ones.

1.4.7.1Specialized vectors

Common Lisp type DuckDB Type
bit BOOLEAN
(unsigned-byte 8) UTINYINT
(unsigned-byte 16) USMALLINT
(unsigned-byte 32) UINTEGER
(unsigned-byte 64) UBIGINT
(unsigned-byte 128) UHUGEINT
(signed-byte 8) TINYINT
(signed-byte 16) SMALLINT
(signed-byte 32) INTEGER
(signed-byte 64) BIGINT
(signed-byte 128) HUGEINT
single-float REAL
double-float DOUBLE

1.4.7.2Lists & unspecialized vectors

List columns or unspecialized vectors need to specify the DuckDB column type and can contain the following values:

  • Booleans (nil, t, :false, :true, :null)
  • Integers in range of the corresponding column type
  • Floating point numbers (single-float & double-float)
  • Strings
  • Date, time or datetime values
  • UUIDs
  • NIL values

1.4.7.3Examples

  (ddb:initialize-default-connection) ; => #<DUCKDB::CONNECTION {10074E8BE3}>

  ;; Use vectors as columns in a query:
  (let ((indexes (make-array '(10) :element-type '(unsigned-byte 8)
                                   :initial-contents '(1 2 3 4 5 6 7 8 9 10)))
        (primes (make-array '(10) :element-type '(unsigned-byte 8)
                                  :initial-contents '(2 3 5 7 11 13 17 19 23 29))))
    (ddb:with-static-table ('primes `((i . ,indexes)
                                      (p . ,primes)))
      (ddb:format-query "SELECT * FROM primes" nil)))
  ;; +----+----+
  ;; | i  | p  |
  ;; +----+----+
  ;; |  1 |  2 |
  ;; |  2 |  3 |
  ;; |  3 |  5 |
  ;; |  4 |  7 |
  ;; |  5 | 11 |
  ;; |  6 | 13 |
  ;; |  7 | 17 |
  ;; |  8 | 19 |
  ;; |  9 | 23 |
  ;; | 10 | 29 |
  ;; +----+----+
  ;; => NIL

  ;; DuckDB column types always have to be specified for lists (NIL
  ;; values are converted to NULL):
  (ddb:with-static-table ('integers `((i . (,(loop :for i :below 1000
                                                   :if (evenp i) :collect i
                                                     :else :collect nil)
                                            :duckdb-integer))))
    (ddb:query (ddb:concat "SELECT sum(i) AS sum "
                           ", COUNT(i) AS not_null_count "
                           "FROM integers")
               nil)) ; => (("sum" . #(249500)) ("not_null_count" . #(500)))

  (ddb:with-static-table ('lyrics `(("in the year" . (,(list (format nil "~R" 2525))
                                                      :duckdb-varchar))))
    (ddb:query "SELECT * FROM lyrics" nil))
  ;; => (("in the year" . #("two thousand five hundred twenty-five")))

  ;; If another table with the same name exists, you can use the
  ;; static_table table function directly:
  (ddb:run (ddb:concat "CREATE TABLE polysemy (\"That you have but slumbered here, "
                       "While these visions did appear\" VARCHAR)"))
  (ddb:with-static-table
      ('polysemy `(("If we shadows have offended, Think but this, and all is mended:"
                    . (() :duckdb-varchar))))
    (ddb:query (ddb:concat "SELECT A.*, B.* FROM static_table('polysemy') AS A "
                           "JOIN polysemy AS B ON true")
               nil))
  ;; => (("If we shadows have offended, Think but this, and all is mended:" . #())
  ;;     ("That you have but slumbered here, While these visions did appear" . #()))

  (ddb:with-static-table ('bools `((v . ((nil t :false :true :null)
                                         :duckdb-boolean))))
    (ddb:format-query "SELECT v, v IS NULL AS is_null FROM bools" nil))
  ;; +-----+---------+
  ;; | v   | is_null |
  ;; +-----+---------+
  ;; | NIL | NIL     |
  ;; | T   | NIL     |
  ;; | NIL | NIL     |
  ;; | T   | NIL     |
  ;; | NIL | T       |
  ;; +-----+---------+
  ;; => NIL

  ;; Static tables can be managed in the global scope using the
  ;; BIND-STATIC-TABLE, UNBIND-STATIC-TABLE and CLEAR-STATIC-TABLES
  ;; functions. Temporarily overriding a table definition via
  ;; WITH-STATIC-TABLE works as expected:
  (ddb:bind-static-table
   'alphabet
   `((c . (("α" "β" "γ" "δ") :duckdb-varchar)))) ; => NIL

  (labels ((get-characters ()
             (loop :with results := (ddb:query "SELECT c FROM alphabet" nil)
                   :for c :across (ddb:get-result results 'c)
                   :collect c)))
    (ddb:with-static-table ('alphabet `((c . (("Ⴀ" "Ⴁ" "Ⴂ" "Ⴃ")
                                              :duckdb-varchar))))
      (ddb:with-static-table ('alphabet `((c . (("ူ0" "ူ1" "ူ2" "ူ3" "ူ4")
                                                :duckdb-varchar))))
        (format t "Etruscan: ~{~a~^, ~}~%" (get-characters)))
      (format t "Asomtavruli: ~{~a~^, ~}~%" (get-characters)))
    (format t "Greek: ~{~a~^, ~}~%" (get-characters)))
  ;; Etruscan: ူ0, ူ1, ူ2, ူ3, ူ4
  ;; Asomtavruli: Ⴀ, Ⴁ, Ⴂ, Ⴃ
  ;; Greek: α, β, γ, δ
  ;; => NIL

  (ddb:unbind-static-table 'alphabet) ; => NIL
  (ddb:clear-static-tables) ; => NIL

1.5Type & Value conversions

DuckDB Type Common Lisp Type Note
NULL null nil (or :null for param. binding)
BOOLEAN boolean t, nil (or :true & :false for param. binding)
VARCHAR string
BLOB (vector (unsigned-byte 8))
REAL single-float
DOUBLE double-float
DECIMAL ratio Max width of 38
TINYINT integer
UTINYINT integer
SMALLINT integer
USMALLINT integer
INTEGER integer
UINTEGER integer
BIGINT integer
UBIGINT integer
HUGEINT integer
UHUGEINT integer
DATE local-time:date
TIMESTAMP local-time:timestamp Microsecond precision
TIME local-time-duration:duration Microsecond precision
INTERVAL periods:duration Microsecond precision
UUID frugal-uuid:uuid
ENUM types string
LIST types list
STRUCT types alist
UNION types
Maps to one of the member types (or nil)
BIT (BITSTRING) bit-vector
  • https://github.com/dlowe-net/local-time
  • https://github.com/enaeher/local-time-duration
  • https://github.com/jwiegley/periods
  • https://github.com/ak-coram/cl-frugal-uuid

1.5.1NIL as boolean FALSE vs NIL as NULL / custom return value for SQL NULL

  (ddb:initialize-default-connection)
  ;; => #<DUCKDB::CONNECTION {101CAC0A73}>

  ;; The boolean TRUE and FALSE values are mapped to T and NIL
  ;; respectively in Lisp, but SQL NULL is also mapped to NIL causing
  ;; some ambiguity:

  (ddb:query "SELECT TRUE AS x, FALSE AS y, NULL AS z" '())
  ;; => (("x" . #(T)) ("y" . #(NIL)) ("z" . #(NIL)))

  ;; When necessary it's possible to differentiate between FALSE and
  ;; NULL by simply using the IS NULL logical operator:

  (ddb:query "SELECT FALSE IS NOT NULL AS x, NULL IS NULL AS y" '())
  ;; => (("x" . #(T)) ("y" . #(T)))

  ;; When binding parameter values, NIL is bound as FALSE when DuckDB
  ;; can determine that the parameter type is boolean and as NULL
  ;; otherwise. This means that simple cases like the following work as
  ;; expected:

  (ddb:run "CREATE TABLE values (v BOOLEAN)"
           '("INSERT INTO values (v) VALUES (?)" (nil))) ; => NIL
  (ddb:query "SELECT v, v IS NOT NULL AS is_not_null FROM values" '())
  ;; => (("v" . #(NIL)) ("is_not_null" . #(T)))

  ;; In some cases DuckDB doesn't determine parameter types based on the
  ;; query and NIL is bound as NULL even for boolean parameters:

  (ddb:query "SELECT ?::boolean || '' IS NULL AS x" '(nil))
  ;; => (("x" . #(T)))

  ;; To differentiate between FALSE and NULL unambiguously when binding
  ;; boolean parameters, the keywords :FALSE and :NULL can be used. In
  ;; the query below the first parameter type is not determined by
  ;; DuckDB, so NIL would be bound AS NULL as seen in the similar
  ;; example directly above. The second parameter type is correctly
  ;; identified as boolean, so NIL would be bound as FALSE in this case.

  (ddb:query "SELECT ?::boolean || '' IS NOT NULL AS x, ?::boolean IS NULL AS y"
             '(:false :null)) ; => (("x" . #(T)) ("y" . #(T)))

  ;; For completeness the :TRUE keyword is also supported. When used as
  ;; a parameter value, it is equivalent to using T:

  (ddb:query "SELECT ? = ? AS x" '(:true t)) ; => (("x" . #(T)))

Alternatively the default return value for SQL NULL can be customized:

  (ddb:initialize-default-connection)
  ;; => #<DUCKDB::CONNECTION {10042C24C3}>

  ;; NIL is used by default
  (ddb:query "SELECT ? AS x" '(:null)) ; => (("x" . #(NIL)))

  ;; Represent SQL NULL values as :NULL in the result for only one call
  (ddb:query "SELECT ? AS x" '(:null)
             :sql-null-return-value :null) ; => (("x" . #(:NULL)))

  ;; Change the default value
  (setf ddb:*sql-null-return-value* :null) ; => :NULL
  (ddb:query "SELECT ? AS x" '(:null)) ; => (("x" . #(:NULL)))

1.6Development setup

  • Install Quicklisp
  • Clone this repository and add it as a local Quicklisp project, forexample:
  git clone git@github.com:ak-coram/cl-duckdb.git ~/Projects/cl-duckdb
  ln -s ~/Projects/cl-duckdb ~/quicklisp/local-projects/cl-duckdb
  • Start your favored REPL (e.g. sbcl) and load the library usingQuicklisp:
  (ql:quickload :duckdb)

1.6.1Running tests

  • Load the tests via Quicklisp:
  (ql:quickload :duckdb/test)
  ;; Using ASDF:
  (asdf:test-system :duckdb)
  ;; Using FiveAM directly:
  (fiveam:run! :duckdb)

1.6.2Running benchmarks

  • Load the benchmarks via Quicklisp:
  (ql:quickload :duckdb/benchmark)
  • Use ASDF or run the benchmarks directly:
  ;; Using ASDF:
  (asdf:test-system :duckdb/benchmark)
  ;; Running directly:
  (duckdb/benchmark:run-benchmarks)

1.7Legal

Dependencies (14)

  • bordeaux-threads
  • cffi
  • cl-ascii-table
  • cl-frugal-uuid
  • cl-ppcre
  • cl-spark
  • fiveam
  • float-features
  • let-plus
  • local-time
  • local-time-duration
  • periods
  • trivial-benchmark
  • uiop

Dependents (0)

    • GitHub
    • Quicklisp