why not take a look at postgresql ? (4): backup and restore

July 4, 2012 — Leave a comment

as a dba one of the most important tasks is to restore the database in case of a crash, total lost of a datacenter or even a user error. so every serious database must provide tools and techniques to backup and restore the system. postgresql provides different tools for different requirements:

cold backups

of course one can perform cold backups with requires the database to be down. a simple tar of the data area and the backup is done. can be easily scripted and implemented. may be a solution for development or even test but nor for production databases. a slightly different approach is to use rsync while the database is running, shutdown the database once rsync finished, do another rsync ( which will require less time ) and start up the database. this will reduce downtime, but still can not be the solution for production databases.

snapshots

of course you can use snapshot technologies provided by the storage vendor or the filesystem ( if you have enabled wal archiving ).

pg_dump

one command to backup a postgres database is pg_dump. this command can be compared to exp/expdp utilities of the oracle database. the result is a sql file which contains all the statements and data to rebuild the database up to the time the dump was taken.

dumping the database is straight forward:

pg_dump -U sysdba template1 > /opt/postgres/backups/dump1.dmp

as I did not create anything in the template1 database the dump is rather small:

cat /opt/postgres/backups/dump1.dmp 
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
--
-- Name: template1; Type: COMMENT; Schema: -; Owner: sysdba
--
COMMENT ON DATABASE template1 IS 'default template for new databases';
--
-- Name: plpgsql; Type: EXTENSION; Schema: -; Owner: 
--
CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;
--
-- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: 
--
COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';
--
-- Name: public; Type: ACL; Schema: -; Owner: sysdba
--
REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM sysdba;
GRANT ALL ON SCHEMA public TO sysdba;
GRANT ALL ON SCHEMA public TO PUBLIC;
--
-- PostgreSQL database dump complete
--

… but you can see that the dump file is plain text.

what really surprised my are the various options you can give to the pg_dump utility. compression, for example, is a feature you have to pay for if you want to use it with expdp. it’s free with postgres:

pg_dump -U sysdba -Z 9 template1 > /opt/postgres/backups/dump1.dmp

of course the dump file will be binary in this case. in general, all you’ll need is there:

Options controlling the output content:
  -a, --data-only             dump only the data, not the schema
  -b, --blobs                 include large objects in dump
  -c, --clean                 clean (drop) database objects before recreating
  -C, --create                include commands to create database in dump
  -E, --encoding=ENCODING     dump the data in encoding ENCODING
  -n, --schema=SCHEMA         dump the named schema(s) only
  -N, --exclude-schema=SCHEMA do NOT dump the named schema(s)
  -o, --oids                  include OIDs in dump
  -O, --no-owner              skip restoration of object ownership in
                              plain-text format
  -s, --schema-only           dump only the schema, no data
  -S, --superuser=NAME        superuser user name to use in plain-text format
  -t, --table=TABLE           dump the named table(s) only
  -T, --exclude-table=TABLE   do NOT dump the named table(s)
  -x, --no-privileges         do not dump privileges (grant/revoke)
  --binary-upgrade            for use by upgrade utilities only
  --column-inserts            dump data as INSERT commands with column names
  --disable-dollar-quoting    disable dollar quoting, use SQL standard quoting
  --disable-triggers          disable triggers during data-only restore
  --inserts                   dump data as INSERT commands, rather than COPY
  --no-security-labels        do not dump security label assignments
  --no-tablespaces            do not dump tablespace assignments
  --no-unlogged-table-data    do not dump unlogged table data
  --quote-all-identifiers     quote all identifiers, even if not key words
  --serializable-deferrable   wait until the dump can run without anomalies
  --use-set-session-authorization
                              use SET SESSION AUTHORIZATION commands instead of
                              ALTER OWNER commands to set ownership

what is missing, from an oracle perspective, is some sort of parallel option and encryption.

to restore the dump file, one uses:

psql -U sysdba db1 < /opt/postgres/backups/dump1.dmp

things to consider:

  • this will not create the database itself, so create the database from template0 before starting the import: createdb -T template0 db1
  • all users must exist before importing the dump

just to mention it: there are two other formats you can use with pg_dump:

  • the tar format: pg_dump -U sysdba -Ft db1 > filename.tar
  • the custom format: pg_dump -U sysdba -Fc db1 > filename.dmp

for both of these commands you have to use pg_restore instead of psql:

pg_restore -U sysdba -d dbname filename.dmp

when choosing one of the custom formats this enables you to create a table of contents of the dumpfiles:

pg_restore -l filename.tar > filename.toc
cat filename.toc
;
; Archive created at Thu Jun 21 13:34:31 2012
;     dbname: db1
;     TOC Entries: 8
;     Compression: 0
;     Dump Version: 1.12-0
;     Format: TAR
;     Integer: 4 bytes
;     Offset: 8 bytes
;     Dumped from database version: 9.1.4
;     Dumped by pg_dump version: 9.1.4
;
;
; Selected TOC Entries:
;
2710; 1262 16384 DATABASE - db1 sysdba
5; 2615 2200 SCHEMA - public sysdba
2711; 0 0 COMMENT - SCHEMA public sysdba
2712; 0 0 ACL - public sysdba
161; 3079 12506 EXTENSION - plpgsql 
2713; 0 0 COMMENT - EXTENSION plpgsql 

if you’d like to do a partial restore you can now edit the file and comment the objects you do not want to get created, pass the adjusted table of contents to pg_restore:

pg_restore -U sysdba -d db1 -L filename.toc filename.dmp

… and you’re done. keep in mind that you may destroy dependencies when commenting objects.

pg_dumpall

another utility to perform dump is: pg_dumpall. in contrast to pg_dump, pg_dumpall exports roles or tablespaces, too. so, if you want to export all databases:

pg_dumpall -U sysdba > /opt/postgres/backups/dump1.dmp

… and to restore the dump:

psql -U sysdba -f /opt/postgres/backups/dump1.dmp db1

what do we have until now ? we can create and restore dumps of a single database ( and some subsets ) what provides some sort of backup. for a production system this will not be enough as data will come into the database and gets modified every moment. shutting down the database and do a filesystem backup is possible, too, but will not solve the issue with backups of production databases. in oracle you can enable archiving to force the database to archive the redo logs. how does postgres handle this ?

wal archiving

postgres writes every change to the database to its logfiles ( as oracle does ) which are located in the pg_xlog directory in the data area. as with oracle, these logs are primarily used to replay the changes when the database comes up after a crash occurred.

the procedure to do online backups with postgres is to combine file system backups with wal ( write ahead archiving ).

to enable wal you obviously need to set some parameters and do some preparation. the first parameter you need to set is: wal_level:

echo "wal_level=archive" >> /opt/postgres/mydb/postgresql.conf

next one needs to enable archiving:

echo "archive_mode=on" >> /opt/postgres/mydb/postgresql.conf

nothing special so far if you know oracle dataguard. the interesting parameter is the third one: archive_command. this parameter specifies the command to use for archiving the logfiles ( or archiveable wal segments, in postgres terms ). this mean, postgres gives you the full flexibility on how to actually do the archiving. for example, you can specify:

archive_command = 'test ! -f /opt/postgres/arch/%f && cp %p /opt/postgres/arch/%f'

or if you want to call a custom script:

archive_command = 'my_allinone_archiving_script'

let’s see if it works. restart the database and check the startup file for any errors:

pgstop
pgstart
cat /opt/postgres/log/mydb_pgstartup.log

seems to work, at least no errors reported during the startup. but does the archiving work? :

pg
sysdba@[local]:5432/postgres# select * from pg_switch_xlog();
 pg_switch_xlog 
----------------
 0/1B7D5E8
(1 row)

if everything works we should now be able to locate the archived log:

ls -la /opt/postgres/arch/
total 49160
drwxrwxr-x. 2 postgres postgres     4096 Jun 13 14:18 .
drwxr-xr-x. 6 postgres postgres     4096 Jun 13 13:29 ..
-rw-------. 1 postgres postgres 16777216 Jun 13 14:17 000000010000000000000001
-rw-------. 1 postgres postgres 16777216 Jun 13 14:17 000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Jun 13 14:18 000000010000000000000003

seems to work. what else needs to be done? we need a level 0 backup, right ?

connect to postgres and start the backup by calling the pg_start_backup function and provide a label:

pg
SELECT pg_start_backup('my_level_0_backup',true);

this will create a backup label file in the data area:

cat /opt/postgres/mydb/backup_label 
START WAL LOCATION: 0/5000020 (file 000000010000000000000005)
CHECKPOINT LOCATION: 0/5000020
BACKUP METHOD: pg_start_backup
START TIME: 2012-06-13 14:23:19 CEST
LABEL: my_level_0_backup

now you can backup all the datafiles with a method of your choice, for example:

tar -cvf my_level_0_backup.ta /opt/postgres/mydb/*

reconnect to postgres and stop the backup mode:

pg
SELECT pg_stop_backup();
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
 pg_stop_backup 
----------------
 0/60000A0
(1 row)

you’re done when the file reported by the pg_stop_backup arrived at the defined location. from now on it is essential that you place the archived files to a save location in case you need to apply them to the base backup in case of recovery.
in oracle syntax this procedure is very similar to “alter database begin backup”, “alter database end backup”…..

if you’d like to save some work there is another command called: pg_basebackup. this tool will do all the work for you: starting the backup, copying the files, ending the backup. check that you have an entry like the following in your pg_hba.conf-file:

host    replication     replication             all                     md5 

… and that the parameter max_wal_senders is high enough:

echo "max_wal_senders=5" >> /opt/postgres/mydb/postgresql.conf

if ok, you are ready:

pg_basebackup -U sysdba -D /opt/postgres/backups/ -Ft -x -z
ls -la /opt/postgres/backups/
total 6444
drwxrwxr-x. 2 postgres postgres    4096 Jun 13 14:45 .
drwxr-xr-x. 6 postgres postgres    4096 Jun 13 13:29 ..
-rw-rw-r--. 1 postgres postgres  733018 Jun 13 14:45 16384.tar.gz
-rw-rw-r--. 1 postgres postgres     121 Jun 13 14:45 16389.tar.gz
-rw-rw-r--. 1 postgres postgres     119 Jun 13 14:45 16391.tar.gz
-rw-rw-r--. 1 postgres postgres 5845363 Jun 13 14:45 base.tar.gz

this includes all the files in data area directory, including configuration files, tablespaces and third party files. so this backup could be extracted and you could start the server directly.

a backup without a proof that the restore is actually possible is not worth anything. let’s try the restore:

restore using the archive backup(s)

the first thing you need to do, obviously, is to shutdown the postgres server if it is running and to make a backup of all the files in data area ( just in case you’ll need any of the configuration data files later ).

pgstop
tar -cvf /opt/postgres/backups/current_state.tar /opt/postgres/mydb/*
# if you are short on space, compress the tar archive
gzip /opt/postgres/backups/current_state.tar

next, cleanup the current data area by deleting all the files and directories ( remember to cleanup any tablespace directories in case they are located elsewhere ) :

rm -rf /opt/postgres/mydb/*

time to restore. locate your backup and extract the files to the data area:

cp /opt/postgres/backups/16* /opt/postgres/backups/base.tar.gz /opt/postgres/mydb/
cd /opt/postgres/mydb/
gunzip *
tar -xvf base.tar
tar -xvf 16391.tar
tar -xvf 16389.tar
tar -xvf 16384.tar
rm -f *.tar

delete the archived wal files currently present in pg_xlog/ directory as they might not be current:

rm -f /opt/postgres/mydb/pg_xlog/*

as with the archive_command above you’ll need to define the restore_command parameter ( recovery settings are maintained in a separate file called recovery.conf ):

echo "restore_command='cp /opt/postgres/arch/%f \"%p\"'" >> /opt/postgres/mydb/recovery.conf

ok, let’s start the server. postgres will start in recovery mode. tail your server logfile to see what’s happening:

pgstart

the logfile reports:

LOG:  database system was interrupted; last known up at 2012-06-13 14:45:08 CEST
LOG:  creating missing WAL directory "pg_xlog/archive_status"
LOG:  starting archive recovery
LOG:  restored log file "00000001000000000000000A" from archive
LOG:  redo starts at 0/A000078
LOG:  consistent recovery state reached at 0/B000000
LOG:  restored log file "00000001000000000000000B" from archive
...
LOG:  archive recovery complete
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

seems the revocery worked. can we connect to the database again ? :

pg
select * from pg_database;

allright. database restored, work done. if you want to do a point in time recovery you’ll need to add one of the recovery target parameters to your recovery.conf file.

one more thing to mention: every time you recover the database from the archived wals the database creates a new time line. you can think of it as the oracle incarnations. if you check your location where the databases places the archived files you will notice a small history file which describes the new time line:

cat /opt/postgres/arch/00000002.history
1	00000001000000000000000B	no recovery target specified

additionally the wal files which get archived from now an, will have an increased number to reflect the time line:

pg
select * from pg_switch_xlog();
 pg_switch_xlog 
----------------
 0/C000098
(1 row)

after the logswitch you can confirm the new timeline by checking the filename of the archived wal:

ls -ltra /opt/postgres/arch/
-rw-------. 1 postgres postgres 16777216 Jun 13 15:13 00000001000000000000000B
-rw-------. 1 postgres postgres       56 Jun 13 15:41 00000002.history
-rw-------. 1 postgres postgres 16777216 Jun 13 15:53 00000002000000000000000C

the timeline id increased from 1 to 2.

conclusion: postgres provides the tools which help you in establishing your backup and restore processes. once you are familiar with the tools, backing up and restoring the database is easy and robust. of course you’ll need to test the procedures before going to production and regular restore tests are a must.

do you think of dataguard ? when there is archiving there should be a possibility to apply these archives to another database …

No Comments

Be the first to start the conversation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.