as a dba one of the most important tasks is to restore the database in case of a crash, total lost of a datacenter or even a user error. so every serious database must provide tools and techniques to backup and restore the system. postgresql provides different tools for different requirements:
cold backups
of course one can perform cold backups with requires the database to be down. a simple tar of the data area and the backup is done. can be easily scripted and implemented. may be a solution for development or even test but nor for production databases. a slightly different approach is to use rsync while the database is running, shutdown the database once rsync finished, do another rsync ( which will require less time ) and start up the database. this will reduce downtime, but still can not be the solution for production databases.
snapshots
of course you can use snapshot technologies provided by the storage vendor or the filesystem ( if you have enabled wal archiving ).
pg_dump
one command to backup a postgres database is pg_dump. this command can be compared to exp/expdp utilities of the oracle database. the result is a sql file which contains all the statements and data to rebuild the database up to the time the dump was taken.
dumping the database is straight forward:
pg_dump -U sysdba template1 > /opt/postgres/backups/dump1.dmp
as I did not create anything in the template1 database the dump is rather small:
cat /opt/postgres/backups/dump1.dmp -- -- PostgreSQL database dump -- SET statement_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SET check_function_bodies = false; SET client_min_messages = warning; -- -- Name: template1; Type: COMMENT; Schema: -; Owner: sysdba -- COMMENT ON DATABASE template1 IS 'default template for new databases'; -- -- Name: plpgsql; Type: EXTENSION; Schema: -; Owner: -- CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog; -- -- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -- COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language'; -- -- Name: public; Type: ACL; Schema: -; Owner: sysdba -- REVOKE ALL ON SCHEMA public FROM PUBLIC; REVOKE ALL ON SCHEMA public FROM sysdba; GRANT ALL ON SCHEMA public TO sysdba; GRANT ALL ON SCHEMA public TO PUBLIC; -- -- PostgreSQL database dump complete --
… but you can see that the dump file is plain text.
what really surprised my are the various options you can give to the pg_dump utility. compression, for example, is a feature you have to pay for if you want to use it with expdp. it’s free with postgres:
pg_dump -U sysdba -Z 9 template1 > /opt/postgres/backups/dump1.dmp
of course the dump file will be binary in this case. in general, all you’ll need is there:
Options controlling the output content: -a, --data-only dump only the data, not the schema -b, --blobs include large objects in dump -c, --clean clean (drop) database objects before recreating -C, --create include commands to create database in dump -E, --encoding=ENCODING dump the data in encoding ENCODING -n, --schema=SCHEMA dump the named schema(s) only -N, --exclude-schema=SCHEMA do NOT dump the named schema(s) -o, --oids include OIDs in dump -O, --no-owner skip restoration of object ownership in plain-text format -s, --schema-only dump only the schema, no data -S, --superuser=NAME superuser user name to use in plain-text format -t, --table=TABLE dump the named table(s) only -T, --exclude-table=TABLE do NOT dump the named table(s) -x, --no-privileges do not dump privileges (grant/revoke) --binary-upgrade for use by upgrade utilities only --column-inserts dump data as INSERT commands with column names --disable-dollar-quoting disable dollar quoting, use SQL standard quoting --disable-triggers disable triggers during data-only restore --inserts dump data as INSERT commands, rather than COPY --no-security-labels do not dump security label assignments --no-tablespaces do not dump tablespace assignments --no-unlogged-table-data do not dump unlogged table data --quote-all-identifiers quote all identifiers, even if not key words --serializable-deferrable wait until the dump can run without anomalies --use-set-session-authorization use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set ownership
what is missing, from an oracle perspective, is some sort of parallel option and encryption.
to restore the dump file, one uses:
psql -U sysdba db1 < /opt/postgres/backups/dump1.dmp
things to consider:
- this will not create the database itself, so create the database from template0 before starting the import: createdb -T template0 db1
- all users must exist before importing the dump
just to mention it: there are two other formats you can use with pg_dump:
- the tar format: pg_dump -U sysdba -Ft db1 > filename.tar
- the custom format: pg_dump -U sysdba -Fc db1 > filename.dmp
for both of these commands you have to use pg_restore instead of psql:
pg_restore -U sysdba -d dbname filename.dmp
when choosing one of the custom formats this enables you to create a table of contents of the dumpfiles:
pg_restore -l filename.tar > filename.toc cat filename.toc ; ; Archive created at Thu Jun 21 13:34:31 2012 ; dbname: db1 ; TOC Entries: 8 ; Compression: 0 ; Dump Version: 1.12-0 ; Format: TAR ; Integer: 4 bytes ; Offset: 8 bytes ; Dumped from database version: 9.1.4 ; Dumped by pg_dump version: 9.1.4 ; ; ; Selected TOC Entries: ; 2710; 1262 16384 DATABASE - db1 sysdba 5; 2615 2200 SCHEMA - public sysdba 2711; 0 0 COMMENT - SCHEMA public sysdba 2712; 0 0 ACL - public sysdba 161; 3079 12506 EXTENSION - plpgsql 2713; 0 0 COMMENT - EXTENSION plpgsql
if you’d like to do a partial restore you can now edit the file and comment the objects you do not want to get created, pass the adjusted table of contents to pg_restore:
pg_restore -U sysdba -d db1 -L filename.toc filename.dmp
… and you’re done. keep in mind that you may destroy dependencies when commenting objects.
pg_dumpall
another utility to perform dump is: pg_dumpall. in contrast to pg_dump, pg_dumpall exports roles or tablespaces, too. so, if you want to export all databases:
pg_dumpall -U sysdba > /opt/postgres/backups/dump1.dmp
… and to restore the dump:
psql -U sysdba -f /opt/postgres/backups/dump1.dmp db1
what do we have until now ? we can create and restore dumps of a single database ( and some subsets ) what provides some sort of backup. for a production system this will not be enough as data will come into the database and gets modified every moment. shutting down the database and do a filesystem backup is possible, too, but will not solve the issue with backups of production databases. in oracle you can enable archiving to force the database to archive the redo logs. how does postgres handle this ?
wal archiving
postgres writes every change to the database to its logfiles ( as oracle does ) which are located in the pg_xlog directory in the data area. as with oracle, these logs are primarily used to replay the changes when the database comes up after a crash occurred.
the procedure to do online backups with postgres is to combine file system backups with wal ( write ahead archiving ).
to enable wal you obviously need to set some parameters and do some preparation. the first parameter you need to set is: wal_level:
echo "wal_level=archive" >> /opt/postgres/mydb/postgresql.conf
next one needs to enable archiving:
echo "archive_mode=on" >> /opt/postgres/mydb/postgresql.conf
nothing special so far if you know oracle dataguard. the interesting parameter is the third one: archive_command. this parameter specifies the command to use for archiving the logfiles ( or archiveable wal segments, in postgres terms ). this mean, postgres gives you the full flexibility on how to actually do the archiving. for example, you can specify:
archive_command = 'test ! -f /opt/postgres/arch/%f && cp %p /opt/postgres/arch/%f'
or if you want to call a custom script:
archive_command = 'my_allinone_archiving_script'
let’s see if it works. restart the database and check the startup file for any errors:
pgstop pgstart cat /opt/postgres/log/mydb_pgstartup.log
seems to work, at least no errors reported during the startup. but does the archiving work? :
pg sysdba@[local]:5432/postgres# select * from pg_switch_xlog(); pg_switch_xlog ---------------- 0/1B7D5E8 (1 row)
if everything works we should now be able to locate the archived log:
ls -la /opt/postgres/arch/ total 49160 drwxrwxr-x. 2 postgres postgres 4096 Jun 13 14:18 . drwxr-xr-x. 6 postgres postgres 4096 Jun 13 13:29 .. -rw-------. 1 postgres postgres 16777216 Jun 13 14:17 000000010000000000000001 -rw-------. 1 postgres postgres 16777216 Jun 13 14:17 000000010000000000000002 -rw-------. 1 postgres postgres 16777216 Jun 13 14:18 000000010000000000000003
seems to work. what else needs to be done? we need a level 0 backup, right ?
connect to postgres and start the backup by calling the pg_start_backup function and provide a label:
pg SELECT pg_start_backup('my_level_0_backup',true);
this will create a backup label file in the data area:
cat /opt/postgres/mydb/backup_label START WAL LOCATION: 0/5000020 (file 000000010000000000000005) CHECKPOINT LOCATION: 0/5000020 BACKUP METHOD: pg_start_backup START TIME: 2012-06-13 14:23:19 CEST LABEL: my_level_0_backup
now you can backup all the datafiles with a method of your choice, for example:
tar -cvf my_level_0_backup.ta /opt/postgres/mydb/*
reconnect to postgres and stop the backup mode:
pg SELECT pg_stop_backup(); NOTICE: pg_stop_backup complete, all required WAL segments have been archived pg_stop_backup ---------------- 0/60000A0 (1 row)
you’re done when the file reported by the pg_stop_backup arrived at the defined location. from now on it is essential that you place the archived files to a save location in case you need to apply them to the base backup in case of recovery.
in oracle syntax this procedure is very similar to “alter database begin backup”, “alter database end backup”…..
if you’d like to save some work there is another command called: pg_basebackup. this tool will do all the work for you: starting the backup, copying the files, ending the backup. check that you have an entry like the following in your pg_hba.conf-file:
host replication replication all md5
… and that the parameter max_wal_senders is high enough:
echo "max_wal_senders=5" >> /opt/postgres/mydb/postgresql.conf
if ok, you are ready:
pg_basebackup -U sysdba -D /opt/postgres/backups/ -Ft -x -z ls -la /opt/postgres/backups/ total 6444 drwxrwxr-x. 2 postgres postgres 4096 Jun 13 14:45 . drwxr-xr-x. 6 postgres postgres 4096 Jun 13 13:29 .. -rw-rw-r--. 1 postgres postgres 733018 Jun 13 14:45 16384.tar.gz -rw-rw-r--. 1 postgres postgres 121 Jun 13 14:45 16389.tar.gz -rw-rw-r--. 1 postgres postgres 119 Jun 13 14:45 16391.tar.gz -rw-rw-r--. 1 postgres postgres 5845363 Jun 13 14:45 base.tar.gz
this includes all the files in data area directory, including configuration files, tablespaces and third party files. so this backup could be extracted and you could start the server directly.
a backup without a proof that the restore is actually possible is not worth anything. let’s try the restore:
restore using the archive backup(s)
the first thing you need to do, obviously, is to shutdown the postgres server if it is running and to make a backup of all the files in data area ( just in case you’ll need any of the configuration data files later ).
pgstop tar -cvf /opt/postgres/backups/current_state.tar /opt/postgres/mydb/* # if you are short on space, compress the tar archive gzip /opt/postgres/backups/current_state.tar
next, cleanup the current data area by deleting all the files and directories ( remember to cleanup any tablespace directories in case they are located elsewhere ) :
rm -rf /opt/postgres/mydb/*
time to restore. locate your backup and extract the files to the data area:
cp /opt/postgres/backups/16* /opt/postgres/backups/base.tar.gz /opt/postgres/mydb/ cd /opt/postgres/mydb/ gunzip * tar -xvf base.tar tar -xvf 16391.tar tar -xvf 16389.tar tar -xvf 16384.tar rm -f *.tar
delete the archived wal files currently present in pg_xlog/ directory as they might not be current:
rm -f /opt/postgres/mydb/pg_xlog/*
as with the archive_command above you’ll need to define the restore_command parameter ( recovery settings are maintained in a separate file called recovery.conf ):
echo "restore_command='cp /opt/postgres/arch/%f \"%p\"'" >> /opt/postgres/mydb/recovery.conf
ok, let’s start the server. postgres will start in recovery mode. tail your server logfile to see what’s happening:
pgstart
the logfile reports:
LOG: database system was interrupted; last known up at 2012-06-13 14:45:08 CEST LOG: creating missing WAL directory "pg_xlog/archive_status" LOG: starting archive recovery LOG: restored log file "00000001000000000000000A" from archive LOG: redo starts at 0/A000078 LOG: consistent recovery state reached at 0/B000000 LOG: restored log file "00000001000000000000000B" from archive ... LOG: archive recovery complete LOG: database system is ready to accept connections LOG: autovacuum launcher started
seems the revocery worked. can we connect to the database again ? :
pg select * from pg_database;
allright. database restored, work done. if you want to do a point in time recovery you’ll need to add one of the recovery target parameters to your recovery.conf file.
one more thing to mention: every time you recover the database from the archived wals the database creates a new time line. you can think of it as the oracle incarnations. if you check your location where the databases places the archived files you will notice a small history file which describes the new time line:
cat /opt/postgres/arch/00000002.history 1 00000001000000000000000B no recovery target specified
additionally the wal files which get archived from now an, will have an increased number to reflect the time line:
pg select * from pg_switch_xlog(); pg_switch_xlog ---------------- 0/C000098 (1 row)
after the logswitch you can confirm the new timeline by checking the filename of the archived wal:
ls -ltra /opt/postgres/arch/ -rw-------. 1 postgres postgres 16777216 Jun 13 15:13 00000001000000000000000B -rw-------. 1 postgres postgres 56 Jun 13 15:41 00000002.history -rw-------. 1 postgres postgres 16777216 Jun 13 15:53 00000002000000000000000C
the timeline id increased from 1 to 2.
conclusion: postgres provides the tools which help you in establishing your backup and restore processes. once you are familiar with the tools, backing up and restoring the database is easy and robust. of course you’ll need to test the procedures before going to production and regular restore tests are a must.
do you think of dataguard ? when there is archiving there should be a possibility to apply these archives to another database …