cl-duckdb
2024-10-12
CFFI wrapper around the DuckDB C API
1cl-duckdb
Common Lisp CFFI wrapper around the DuckDB C API
1.1Dependencies
Currently the following Common Lisp implementations and operating systems are tested via CI. Android via Termux (ECL & SBCL) and some other BSDs are also known to work.
- SBCL (Linux, FreeBSD, Windows, macOS, macOS on AArch64)
- CCL (Linux, macOS)
- ECL (Linux, FreeBSD, macOS, macOS on AArch64), see the ECL.org document for specifics
The following native libraries need to be installed in a location where CFFI can find them:
For example on Ubuntu or Debian (amd64):
sudo apt-get install libffi-dev unzip
# Download libduckdb-linux-amd64.zip from the C/C++ section of https://duckdb.org/docs/installation/
sudo unzip ~/Downloads/libduckdb-linux-amd64.zip libduckdb.so -d /usr/lib/
1.2Installation
cl-duckdb can now be installed via Quicklisp from the Ultralisp distribution:
;; Install the ultralisp distribution if you don't have it already
(ql-dist:install-dist "http://dist.ultralisp.org/" :prompt nil)
;; Load cl-duckdb
(ql:quickload :duckdb)
1.3Packages
- DUCKDB (nicknamed DDB): provides the high-level API.
- DUCKDB-API: contains the low-level bindings to the DuckDB C API.
1.4Usage
1.4.1Connecting to a database
This library relies on the special variable DUCKDB:*CONNECTION* for a default database connection. Setting up a global default connection is recommmended for interactive REPL sessions:
;; Use an in-memory database as the default connection
(ddb:initialize-default-connection)
;; Use a persistent database as the default connection
(ddb:initialize-default-connection :path "my_database.ddb")
;; Clean up the default connection at the end of the session
(duckdb:disconnect-default-connection)
For manual connection management most functions requiring a database connection also accept a connection object as a keyword argument (see DUCKDB:OPEN-DATABASE and DUCKDB:CONNECT for creating one).
To dynamically bind and automatically clean up a default connection, refer to DUCKDB:WITH-DEFAULT-CONNECTION and DUCKDB:WITH-TRANSIENT-CONNECTION instead.
1.4.2Basic example
;; Use an in-memory transient database
(ddb:with-transient-connection
;; Create a new range table containing integers
(ddb:run "CREATE TABLE range (i INTEGER PRIMARY KEY)"
"CREATE SEQUENCE seq_range_i START 1")
;; Use a prepared statement to populate the table with a 1000 values
(ddb:with-statement (statement "INSERT INTO range VALUES (nextval('seq_range_i'))")
(dotimes (_ 1000) (ddb:perform statement)))
;; Solve Project Euler Problem 9
(let* ((euler9-query (ddb:concat "SELECT a.i * b.i * c.i AS solution "
"FROM range AS c "
"JOIN range AS b ON b.i < c.i "
"JOIN range AS a ON a.i < b.i "
"WHERE a.i + b.i + c.i = ? "
"AND a.i * a.i + b.i * b.i = c.i * c.i"))
(parameters '(1000))
(results (ddb:query euler9-query parameters)))
(format t "PE9 Solution: ~a~%" (ddb:get-result results 'solution 0))))
1.4.3Interactive example: query remote Parquet data
The DUCKDB:Q (short for QUERY) and DUCKDB:FQ (short for FORMAT-QUERY) functions are provided as shorthands for interactive REPL use:
(ddb:initialize-default-connection)
(ddb:q "INSTALL httpfs") ; => (("Success" . #()))
(let ((url "https://github.com/apache/parquet-mr/raw/master/parquet-hadoop/src/test/resources/test-file-with-no-column-indexes-1.parquet"))
(ddb:fq "SELECT * FROM read_parquet(?) WHERE id < 10" url))
;; +----+------+--------------------------------+----------------------------------------+
;; | id | name | location | phoneNumbers |
;; +----+------+--------------------------------+----------------------------------------+
;; | 0 | p0 | NIL | ((phone ((number . 0) (kind . cell)))) |
;; | 1 | p1 | ((lon . 1.0d0) (lat . 2.0d0)) | ((phone ((number . 1) (kind . cell)))) |
;; | 2 | p2 | ((lon . 2.0d0) (lat)) | ((phone ((number . 2) (kind . cell)))) |
;; | 3 | p3 | NIL | ((phone ((number . 3) (kind . cell)))) |
;; | 4 | p4 | ((lon . 4.0d0) (lat . 8.0d0)) | ((phone ((number . 4) (kind . cell)))) |
;; | 5 | p5 | ((lon . 5.0d0) (lat)) | ((phone ((number . 5) (kind . cell)))) |
;; | 6 | p6 | NIL | ((phone ((number . 6) (kind . cell)))) |
;; | 7 | p7 | ((lon . 7.0d0) (lat . 14.0d0)) | ((phone ((number . 7) (kind . cell)))) |
;; | 8 | p8 | ((lon . 8.0d0) (lat)) | ((phone ((number . 8) (kind . cell)))) |
;; | 9 | p9 | NIL | ((phone ((number . 9) (kind . cell)))) |
;; +----+------+--------------------------------+----------------------------------------+
;; => NIL
1.4.4Sparks
There's some support for plotting query results directly in the REPL via cl-spark:
(ddb:initialize-default-connection) ; => #<DUCKDB::CONNECTION {1014081EF3}>
(ddb:bind-static-table
'numbers `(("x" . (,(loop :for i :from 0d0 :by 0.2 :below pi :collect i)
:duckdb-double)))) ; => NIL
(ddb:spark-query "SELECT x, sin(x) AS y, cos(x) AS z FROM numbers" nil '(x y z))
;; X ▁▁▁▂▂▃▃▄▄▅▅▆▆▇▇█
;; Y ▁▂▃▄▆▆▇▇█▇▇▆▅▄▃▁
;; Z █▇▇▇▆▆▅▅▄▃▃▂▁▁▁▁
;; => NIL
(ddb:vspark-query "SELECT pow(2, x) AS y FROM numbers" nil nil 'y)
;; 1.0 4.5 8.0
;; ˫-----------------------+------------------------˧
;; ▏
;; █▏
;; ██▎
;; ███▋
;; █████▍
;; ███████▏
;; █████████▎
;; ███████████▋
;; ██████████████▌
;; █████████████████▊
;; █████████████████████▍
;; █████████████████████████▋
;; ██████████████████████████████▌
;; ████████████████████████████████████▎
;; ██████████████████████████████████████████▋
;; ██████████████████████████████████████████████████
;; => NIL
(ddb:vspark-query "SELECT round(x, 2)::text AS x, sqrt(x) AS y FROM numbers" nil
'x 'y)
;; 0.0 0.8660254 1.7320508
;; ˫---------------------+----------------------˧
;; 0.0 ▏
;; 0.2 ███████████▉
;; 0.4 ████████████████▊
;; 0.6 ████████████████████▋
;; 0.8 ███████████████████████▊
;; 1.0 ██████████████████████████▌
;; 1.2 █████████████████████████████▏
;; 1.4 ███████████████████████████████▍
;; 1.6 █████████████████████████████████▋
;; 1.8 ███████████████████████████████████▋
;; 2.0 █████████████████████████████████████▌
;; 2.2 ███████████████████████████████████████▍
;; 2.4 █████████████████████████████████████████▎
;; 2.6 ██████████████████████████████████████████▊
;; 2.8 ████████████████████████████████████████████▌
;; 3.0 ██████████████████████████████████████████████
;; => NIL
1.4.5Writing queries via SxQL
If you want to use a syntax based on s-expressions for your queries, then the SxQL library is an option:
(ddb:initialize-default-connection)
;; Load SxQL
(ql:quickload :sxql)
(use-package :sxql)
;; Create a table
(ddb:run (yield (create-table :numbers
((i :type 'integer
:primary-key t)))))
;; Define utility function
(defun query-sxql (q) (multiple-value-call #'ddb:query (yield q)))
;; Populate table with values
(loop :for x :below 100
:do (query-sxql (insert-into :numbers (set= :i x))))
(query-sxql (select ((:as (:sum :i) :sum))
(from :numbers)
(where (:even :i))))
;; => (("sum" . #(4950)))
Please refer to the SxQL documentation for more examples.
1.4.6Appenders
Appenders are one of the ways of loading bulk data into DuckDB. They append rows to a single table of a database:
(ddb:initialize-default-connection) ; => #<DUCKDB::CONNECTION {100B1088F3}>
(ddb:run "CREATE TABLE roman_numerals (i INTEGER, value TEXT)") ; => NIL
(ddb:with-appender (appender "roman_numerals")
(loop :for i :from 1 :below 4999
:do (ddb:append-row appender (list i (format nil "~:@R" i))))) ; => NIL
(ddb:get-result (ddb:query "SELECT * FROM roman_numerals WHERE i = 1848" nil)
'value 0) ; => "MDCCCXXXXVIII"
1.4.7Querying Lisp vectors and lists as table columns
Currently only the following types are supported (the values are currently copied into DuckDB data chunks internally). Using a combination of vectors and list for different columns is possible, but each column should have the same length. Tables using Lisp data structures are not bound to a single connection and work across different ones.
1.4.7.1Specialized vectors
Common Lisp type | DuckDB Type |
---|---|
bit | BOOLEAN |
(unsigned-byte 8) | UTINYINT |
(unsigned-byte 16) | USMALLINT |
(unsigned-byte 32) | UINTEGER |
(unsigned-byte 64) | UBIGINT |
(unsigned-byte 128) | UHUGEINT |
(signed-byte 8) | TINYINT |
(signed-byte 16) | SMALLINT |
(signed-byte 32) | INTEGER |
(signed-byte 64) | BIGINT |
(signed-byte 128) | HUGEINT |
single-float | REAL |
double-float | DOUBLE |
1.4.7.2Lists & unspecialized vectors
List columns or unspecialized vectors need to specify the DuckDB column type and can contain the following values:
- Booleans (nil, t, :false, :true, :null)
- Integers in range of the corresponding column type
- Floating point numbers (single-float & double-float)
- Strings
- Date, time or datetime values
- UUIDs
- NIL values
1.4.7.3Examples
(ddb:initialize-default-connection) ; => #<DUCKDB::CONNECTION {10074E8BE3}>
;; Use vectors as columns in a query:
(let ((indexes (make-array '(10) :element-type '(unsigned-byte 8)
:initial-contents '(1 2 3 4 5 6 7 8 9 10)))
(primes (make-array '(10) :element-type '(unsigned-byte 8)
:initial-contents '(2 3 5 7 11 13 17 19 23 29))))
(ddb:with-static-table ('primes `((i . ,indexes)
(p . ,primes)))
(ddb:format-query "SELECT * FROM primes" nil)))
;; +----+----+
;; | i | p |
;; +----+----+
;; | 1 | 2 |
;; | 2 | 3 |
;; | 3 | 5 |
;; | 4 | 7 |
;; | 5 | 11 |
;; | 6 | 13 |
;; | 7 | 17 |
;; | 8 | 19 |
;; | 9 | 23 |
;; | 10 | 29 |
;; +----+----+
;; => NIL
;; DuckDB column types always have to be specified for lists (NIL
;; values are converted to NULL):
(ddb:with-static-table ('integers `((i . (,(loop :for i :below 1000
:if (evenp i) :collect i
:else :collect nil)
:duckdb-integer))))
(ddb:query (ddb:concat "SELECT sum(i) AS sum "
", COUNT(i) AS not_null_count "
"FROM integers")
nil)) ; => (("sum" . #(249500)) ("not_null_count" . #(500)))
(ddb:with-static-table ('lyrics `(("in the year" . (,(list (format nil "~R" 2525))
:duckdb-varchar))))
(ddb:query "SELECT * FROM lyrics" nil))
;; => (("in the year" . #("two thousand five hundred twenty-five")))
;; If another table with the same name exists, you can use the
;; static_table table function directly:
(ddb:run (ddb:concat "CREATE TABLE polysemy (\"That you have but slumbered here, "
"While these visions did appear\" VARCHAR)"))
(ddb:with-static-table
('polysemy `(("If we shadows have offended, Think but this, and all is mended:"
. (() :duckdb-varchar))))
(ddb:query (ddb:concat "SELECT A.*, B.* FROM static_table('polysemy') AS A "
"JOIN polysemy AS B ON true")
nil))
;; => (("If we shadows have offended, Think but this, and all is mended:" . #())
;; ("That you have but slumbered here, While these visions did appear" . #()))
(ddb:with-static-table ('bools `((v . ((nil t :false :true :null)
:duckdb-boolean))))
(ddb:format-query "SELECT v, v IS NULL AS is_null FROM bools" nil))
;; +-----+---------+
;; | v | is_null |
;; +-----+---------+
;; | NIL | NIL |
;; | T | NIL |
;; | NIL | NIL |
;; | T | NIL |
;; | NIL | T |
;; +-----+---------+
;; => NIL
;; Static tables can be managed in the global scope using the
;; BIND-STATIC-TABLE, UNBIND-STATIC-TABLE and CLEAR-STATIC-TABLES
;; functions. Temporarily overriding a table definition via
;; WITH-STATIC-TABLE works as expected:
(ddb:bind-static-table
'alphabet
`((c . (("α" "β" "γ" "δ") :duckdb-varchar)))) ; => NIL
(labels ((get-characters ()
(loop :with results := (ddb:query "SELECT c FROM alphabet" nil)
:for c :across (ddb:get-result results 'c)
:collect c)))
(ddb:with-static-table ('alphabet `((c . (("Ⴀ" "Ⴁ" "Ⴂ" "Ⴃ")
:duckdb-varchar))))
(ddb:with-static-table ('alphabet `((c . (("ူ0" "ူ1" "ူ2" "ူ3" "ူ4")
:duckdb-varchar))))
(format t "Etruscan: ~{~a~^, ~}~%" (get-characters)))
(format t "Asomtavruli: ~{~a~^, ~}~%" (get-characters)))
(format t "Greek: ~{~a~^, ~}~%" (get-characters)))
;; Etruscan: ူ0, ူ1, ူ2, ူ3, ူ4
;; Asomtavruli: Ⴀ, Ⴁ, Ⴂ, Ⴃ
;; Greek: α, β, γ, δ
;; => NIL
(ddb:unbind-static-table 'alphabet) ; => NIL
(ddb:clear-static-tables) ; => NIL
1.5Type & Value conversions
DuckDB Type | Common Lisp Type | Note |
---|---|---|
NULL | null | nil (or :null for param. binding) |
BOOLEAN | boolean | t, nil (or :true & :false for param. binding) |
VARCHAR | string | |
BLOB | (vector (unsigned-byte 8)) | |
REAL | single-float | |
DOUBLE | double-float | |
DECIMAL | ratio | Max width of 38 |
TINYINT | integer | |
UTINYINT | integer | |
SMALLINT | integer | |
USMALLINT | integer | |
INTEGER | integer | |
UINTEGER | integer | |
BIGINT | integer | |
UBIGINT | integer | |
HUGEINT | integer | |
UHUGEINT | integer | |
DATE | local-time:date | |
TIMESTAMP | local-time:timestamp | Microsecond precision |
TIME | local-time-duration:duration | Microsecond precision |
INTERVAL | periods:duration | Microsecond precision |
UUID | frugal-uuid:uuid | |
ENUM types | string | |
LIST types | list | |
STRUCT types | alist | |
UNION types | Maps to one of the member types (or nil) | |
BIT (BITSTRING) | bit-vector |
- https://github.com/dlowe-net/local-time
- https://github.com/enaeher/local-time-duration
- https://github.com/jwiegley/periods
- https://github.com/ak-coram/cl-frugal-uuid
1.5.1NIL as boolean FALSE vs NIL as NULL / custom return value for SQL NULL
(ddb:initialize-default-connection)
;; => #<DUCKDB::CONNECTION {101CAC0A73}>
;; The boolean TRUE and FALSE values are mapped to T and NIL
;; respectively in Lisp, but SQL NULL is also mapped to NIL causing
;; some ambiguity:
(ddb:query "SELECT TRUE AS x, FALSE AS y, NULL AS z" '())
;; => (("x" . #(T)) ("y" . #(NIL)) ("z" . #(NIL)))
;; When necessary it's possible to differentiate between FALSE and
;; NULL by simply using the IS NULL logical operator:
(ddb:query "SELECT FALSE IS NOT NULL AS x, NULL IS NULL AS y" '())
;; => (("x" . #(T)) ("y" . #(T)))
;; When binding parameter values, NIL is bound as FALSE when DuckDB
;; can determine that the parameter type is boolean and as NULL
;; otherwise. This means that simple cases like the following work as
;; expected:
(ddb:run "CREATE TABLE values (v BOOLEAN)"
'("INSERT INTO values (v) VALUES (?)" (nil))) ; => NIL
(ddb:query "SELECT v, v IS NOT NULL AS is_not_null FROM values" '())
;; => (("v" . #(NIL)) ("is_not_null" . #(T)))
;; In some cases DuckDB doesn't determine parameter types based on the
;; query and NIL is bound as NULL even for boolean parameters:
(ddb:query "SELECT ?::boolean || '' IS NULL AS x" '(nil))
;; => (("x" . #(T)))
;; To differentiate between FALSE and NULL unambiguously when binding
;; boolean parameters, the keywords :FALSE and :NULL can be used. In
;; the query below the first parameter type is not determined by
;; DuckDB, so NIL would be bound AS NULL as seen in the similar
;; example directly above. The second parameter type is correctly
;; identified as boolean, so NIL would be bound as FALSE in this case.
(ddb:query "SELECT ?::boolean || '' IS NOT NULL AS x, ?::boolean IS NULL AS y"
'(:false :null)) ; => (("x" . #(T)) ("y" . #(T)))
;; For completeness the :TRUE keyword is also supported. When used as
;; a parameter value, it is equivalent to using T:
(ddb:query "SELECT ? = ? AS x" '(:true t)) ; => (("x" . #(T)))
Alternatively the default return value for SQL NULL can be customized:
(ddb:initialize-default-connection)
;; => #<DUCKDB::CONNECTION {10042C24C3}>
;; NIL is used by default
(ddb:query "SELECT ? AS x" '(:null)) ; => (("x" . #(NIL)))
;; Represent SQL NULL values as :NULL in the result for only one call
(ddb:query "SELECT ? AS x" '(:null)
:sql-null-return-value :null) ; => (("x" . #(:NULL)))
;; Change the default value
(setf ddb:*sql-null-return-value* :null) ; => :NULL
(ddb:query "SELECT ? AS x" '(:null)) ; => (("x" . #(:NULL)))
1.6Development setup
- Install Quicklisp
- Clone this repository and add it as a local Quicklisp project, forexample:
git clone git@github.com:ak-coram/cl-duckdb.git ~/Projects/cl-duckdb
ln -s ~/Projects/cl-duckdb ~/quicklisp/local-projects/cl-duckdb
- Start your favored REPL (e.g. sbcl) and load the library usingQuicklisp:
(ql:quickload :duckdb)
1.6.1Running tests
- Load the tests via Quicklisp:
(ql:quickload :duckdb/test)
;; Using ASDF:
(asdf:test-system :duckdb)
;; Using FiveAM directly:
(fiveam:run! :duckdb)
1.6.2Running benchmarks
- Load the benchmarks via Quicklisp:
(ql:quickload :duckdb/benchmark)
- Use ASDF or run the benchmarks directly:
;; Using ASDF:
(asdf:test-system :duckdb/benchmark)
;; Running directly:
(duckdb/benchmark:run-benchmarks)
1.7Legal
- Released under the MIT License, same as DuckDB.
- Developer Certificate of Origin
- for README photo