Quickdocs

pgloader

2022-11-07

Load data into PostgreSQL

Upstream URL

github.com/dimitri/pgloader

Author

Dimitri Fontaine <dim@tapoueh.org>

License

The PostgreSQL Licence

README

Provided Systems

PGLoader

pgloader is a data loading tool for PostgreSQL, using the COPY command.

Its main advantage over just using COPY or \copy, and over using a Foreign Data Wrapper, is its transaction behaviour, where pgloader will keep a separate file of rejected data, but continue trying to copy good data in your database.

The default PostgreSQL behaviour is transactional, which means that any erroneous line in the input data (file or remote database) will stop the entire bulk load for the table.

pgloader also implements data reformatting, a typical example of that being the transformation of MySQL datestamps 0000-00-00 and 0000-00-00 00:00:00 to PostgreSQL NULL value (because our calendar never had a year zero).

Documentation

Full documentation is available online, including manual pages of all the pgloader sub-commands. Check out https://pgloader.readthedocs.io/.

$ pgloader --help
pgloader [ option ... ] SOURCE TARGET
  --help -h                       boolean  Show usage and exit.
  --version -V                    boolean  Displays pgloader version and exit.
  --quiet -q                      boolean  Be quiet
  --verbose -v                    boolean  Be verbose
  --debug -d                      boolean  Display debug level information.
  --client-min-messages           string   Filter logs seen at the console (default: "warning")
  --log-min-messages              string   Filter logs seen in the logfile (default: "notice")
  --summary -S                    string   Filename where to copy the summary
  --root-dir -D                   string   Output root directory. (default: #P"/tmp/pgloader/")
  --upgrade-config -U             boolean  Output the command(s) corresponding to .conf file for v2.x
  --list-encodings -E             boolean  List pgloader known encodings and exit.
  --logfile -L                    string   Filename where to send the logs.
  --load-lisp-file -l             string   Read user code from files
  --dry-run                       boolean  Only check database connections, don't load anything.
  --on-error-stop                 boolean  Refrain from handling errors properly.
  --no-ssl-cert-verification      boolean  Instruct OpenSSL to bypass verifying certificates.
  --context -C                    string   Command Context Variables
  --with                          string   Load options
  --set                           string   PostgreSQL options
  --field                         string   Source file fields specification
  --cast                          string   Specific cast rules
  --type                          string   Force input source type
  --encoding                      string   Source expected encoding
  --before                        string   SQL script to run before loading the data
  --after                         string   SQL script to run after loading the data
  --self-upgrade                  string   Path to pgloader newer sources
  --regress                       boolean  Drive regression testing

Usage

You can either give a command file to pgloader or run it all from the command line, see the pgloader quick start on https://pgloader.readthedocs.io for more details.

$ ./build/bin/pgloader --help
$ ./build/bin/pgloader <file.load>

For example, for a full migration from SQLite:

$ createdb newdb
$ pgloader ./test/sqlite/sqlite.db postgresql:///newdb

Or for a full migration from MySQL, including schema definition (tables, indexes, foreign keys, comments) and parallel loading of the corrected data:

$ createdb pagila
$ pgloader mysql://user@localhost/sakila postgresql:///pagila

LICENCE

pgloader is available under The PostgreSQL Licence.

INSTALL

Please see full documentation at https://pgloader.readthedocs.io/.

If you're using debian, it's already available:

$ apt-get install pgloader

If you're using docker, you can use the latest version built by the CI at each commit to the master branch:

$ docker pull ghcr.io/dimitri/pgloader:latest
$ docker run --rm -it ghcr.io/dimitri/pgloader:latest pgloader --version