from time to time it is interesting to see how other databases are implemented and as postgresql seems to be one of the most widely used open source databases I decided to have a look at it. the result will be some posts which describe the very basics one needs to know to start working with this database system. please keep in mind that I am not an experienced postgres admin nor developer so some statements may not be totally correct or even misleading. just drop me an email or post a comment if you find any wrong statements or think something is missing to complete the picture.
if you are running a rpm based linux distribution the easiest way to install postgres is to include the postgres yum repository ( if you are on a debian based distribution there are apt repositories, too, just check the download section on the postgresql website )
to avoid any conflicts with current repositories it may be a good idea to exclude the postgres packages provided by your distribution. for this you need to add “exclude=postgresql*” to the sections of your repositories. here is an example:
cat CentOS-Base.repo # CentOS-Base.repo # # The mirror system uses the connecting IP address of the client and the # update status of each mirror to pick mirrors that are updated to and # geographically close to the client. You should use this for CentOS updates # unless you are manually picking other mirrors. # # If the mirrorlist= does not work for you, as a fall back you can try the # remarked out baseurl= line instead. # # [base] name=CentOS-$releasever - Base mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/ gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6 exclude=postgresql* #released updates [updates] name=CentOS-$releasever - Updates mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates #baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/ gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6 exclude=postgresql* #additional packages that may be useful [extras] name=CentOS-$releasever - Extras mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras #baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/ gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6 exclude=postgresql* #additional packages that extend functionality of existing packages [centosplus] name=CentOS-$releasever - Plus mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=centosplus #baseurl=http://mirror.centos.org/centos/$releasever/centosplus/$basearch/ gpgcheck=1 enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6 exclude=postgresql* #contrib - packages by Centos Users [contrib] name=CentOS-$releasever - Contrib mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=contrib #baseurl=http://mirror.centos.org/centos/$releasever/contrib/$basearch/ gpgcheck=1 enabled=0 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6 exclude=postgresql*
now you can safely add the postgresql repository to your yum:
wget http://yum.postgresql.org/9.1/redhat/rhel-6-x86_64/pgdg-centos91-9.1-4.noarch.rpm rpm -ihv pgdg-centos91-9.1-4.noarch.rpm
this will download the repository rpm and, once installed, create the yum repo file:
ls -la /etc/yum.repos.d/ total 24 drwxr-xr-x. 2 root root 4096 Jun 12 07:28 . drwxr-xr-x. 61 root root 4096 Jun 12 07:28 .. -rw-r--r--. 1 root root 2026 Jun 12 07:22 CentOS-Base.repo -rw-r--r--. 1 root root 657 Jun 12 07:22 CentOS-Debuginfo.repo -rw-r--r--. 1 root root 646 Jun 12 07:22 CentOS-Media.repo -rw-r--r--. 1 root root 436 Aug 22 2011 pgdg-91-centos.repo
before installing the software you need to decide which packages you want to install. check the Installing PostgreSQL on Red Hat Enterprise Linux / Fedora Core document for a list of the packages provided.
because I want to install the server, this is the procedure to use:
yum clean all yum install postgresql91-libs, postgresql91 and postgresql91-server
as stated in the pdf linked above, these are the locations of the files installed:
what | where |
---|---|
Executables | /usr/bin |
Libraries | /usr/lib |
Documentation | /usr/share/doc/postgresqlx.y.z, /usr/share/doc/postgresqlx.y.z/contrib |
Contrib | /usr/share/pgsql/contrib |
Data | /var/lib/pgsql/data |
Backup area | /var/lib/pgsql/backup |
Templates | /usr/share/pgsql |
Procedural Languages | /usr/lib/pgsql |
Development Headers | /usr/include/pgsql |
Other shared data | /usr/share/pgsql |
Regression tests | /usr/lib/pgsql/test/regress |
the very first thing one needs to do is to initialize the database cluster, which in postgresql terms is the database storage area on disk which contains the actual data. according to the documentation this is also called: data directory or data area. the definition the wiki referenced below uses is: “A database cluster is a collection of databases that is stored at a common file system location (the “data area”). It is possible to have multiple database clusters, so long as they use different data areas and different communication ports.”
once initialized the cluster will contain three databases:
- postgres: the default database for utilities and users
- template0: same as template1 but should not be used for creating any objects or users
- template1: which serves as a template for other databases created ( everything installed in the template DB will be copied to newly created databases )
as with every database there needs to be a location on disk where all the files will be stored:
su - mkdir -p /opt/postgres/mydb mkdir -p /opt/postgres/log chown postgres:postgres /opt/postgres chown postgres:postgres /opt/postgres/mydb chown postgres:postgres /opt/postgres/log
the installed package provides a system V init script. because I created custom locations for the data and the logfiles I adjusted the init script:
vi /etc/init.d/postgresql-9.1 PGDATA=/opt/postgres/mydb PGLOG=/opt/postgres/log/mydb_pgstartup.log
to initialize the cluster on the newly created location the initdb command is used:
/usr/pgsql-9.1/bin/initdb -D /opt/postgres/mydb -U sysdba -W The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale en_US.UTF-8. The default database encoding has accordingly been set to UTF8. The default text search configuration will be set to "english". fixing permissions on existing directory /opt/postgres/mydb ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in /opt/postgres/mydb/base/1 ... ok initializing pg_authid ... ok Enter new superuser password: Enter it again: setting password ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating collations ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok loading PL/pgSQL server-side language ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the -A option the next time you run initdb. Success. You can now start the database server using: /usr/pgsql-9.1/bin/postgres -D /opt/postgres/mydb or /usr/pgsql-9.1/bin/pg_ctl -D /opt/postgres/mydb -l logfile start
in my opinion the “-U” and “-W” are very important. by using inidb’s default behaviour every local os-user may become the database superuser. so specifying a username and password for the superuser is a good idea.
for the scope of this and the following posts I will start with a minimal set of parameters, so overwriting the default configuration file ( this is the equivalent of oracle’s pfile ):
echo "max_connections = 100 shared_buffers = 32MB log_destination = 'stderr' logging_collector = on log_directory = '/opt/postgres/log/' log_filename = 'postgresql-%a.log' log_file_mode = 0600 log_truncate_on_rotation = on log_rotation_age = 1d log_rotation_size = 0 datestyle = 'iso, mdy' lc_messages = 'en_US.UTF-8' lc_monetary = 'en_US.UTF-8' lc_numeric = 'en_US.UTF-8' lc_time = 'en_US.UTF-8' default_text_search_config = 'pg_catalog.english'" > /opt/postgres/mydb/postgresql.conf
startup the service:
# check the current status of the service service postgresql-9.1 status # start the service service postgresql-9.1 start
quickly use chkconfig to see if postgres will come up again once the servers is rebooted:
chkconfig --list postgresql-9.1 postgresql-9.1 0:off 1:off 2:off 3:off 4:off 5:off 6:off
no. so let’s enable this:
chkconfig postgresql-9.1 on chkconfig --list postgresql-9.1 postgresql-9.1 0:off 1:off 2:on 3:on 4:on 5:on 6:off
installation done. do a reboot to confirm everything works as expected..
reboot service postgresql-9.1 status (pid 937) is running...
if you prefer to manually startup the service you can either do:
postgres -D /opt/postgres/mydb >logfile 2>&1 &
or
pg_ctl start -D /opt/postgres/mydb -l serverlog
and always remember to _not_ start the service as the root user.
for shutting down the server manually use:
/usr/pgsql-9.1/bin/pg_ctl stop -D /opt/postgres/mydb/ -m smart
as with the oracle database there are some shutdown options ( the “-m” switch ):
- smart: quit after all clients have disconnected
- fast: quit directly, with proper shutdown
- immediate: quit without complete shutdown; will lead to recovery on restart
you can directly send the signals to the server process, too, if you prefer:
- SIGTERM = smart
- SIGINT = fast
- SIGQUIT = immediate
for exmaple:
kill -INT `head -1 /opt/postgres/mydb/postmaster.pid`
as admins tend to be lazy let’s simplify life and define the environment and some aliases:
echo "PGDATA=/opt/postgres/mydb PATH=$PATH:/usr/pgsql-9.1/bin PGLOGDIR=/opt/postgres/log PGPORT=5432 PGLOG=$PGLOGDIR/mydb_pgstartup.log export PGDATA PATH PGLOG PGPORT alias pgstart='pg_ctl start -D $PGDATA -l $PGLOG' alias pgstop='pg_ctl stop -D $PGDATA -m fast' alias pgreload='pg_ctl reload -D $PGDATA' alias pgrestart='pg_ctl restart -D $PGDATA -m fast' alias pgstatus='pg_ctl status' alias viparam='vi $PGDATA/postgresql.conf' alias tlog='tail -1000f $PGLOGDIR/postgresql-Tue.log'" >> ~/.bashrc . ~/.bashrc
if you miss the kernel parameters you may check the documentation for the recommendations ( the default parameters should be sufficient for a test installation ).
might be a good idea to prevent the OOM symptom mentioned in the documentation by setting the parameter in question :) :
sysctl -w vm.overcommit_memory=2
oracle uses various kinds of processes for different tasks. how does postgresql handle this ? there is a wiki on the postgres website especially for oracle dbas which describes the processes involved:
- Master process – launches the other processes, background and session processes.
- Writer process – background process that coordinates database writes, log writes and checkpoints.
- Stats collector process – background process collecting information about server activity.
- User session processes.
additionally you will see processes like this:
- Wal writer process – background process for writing write ahead logs ( redo )
- autovacuum launcher process – launcher for processes that scan tables to release memory/space and prevents loss of data. vacuum processes are also used to clean-up any unused undo data which is present in the table’s files. there are no separate undo/rollback segments as in oracle
if you take a look at the data area ( /opt/postgres/mydb in my case ) you will see various files and directories. the wiki mentioned above provides a clean description what all the files and directories are for:
- postgresql.conf – Parameter or main server configuration file.
- pg_hba.conf – Client authentication configuration file.
- pg_ident.conf – Map from OS account to PostgreSQL account file.
The cluster subdirectories:
- base – Subdirectory containing per-database subdirectories
- global – Subdirectory containing cluster-wide tables
- pg_auth – Authorization file containing user and role definitions.
- pg_control – Control file.
- pg_database – Information of databases within the cluster.
- pg_clog – Subdirectory containing transaction commit status data
- pg_multixact – Subdirectory containing multitransaction status data (used for shared row locks)
- pg_subtrans – Subdirectory containing subtransaction status data
- pg_tblspc – Subdirectory containing symbolic links to tablespaces
- pg_twophase – Subdirectory containing state files for prepared transactions
- pg_xlog – Subdirectory containing WAL (Write Ahead Log) files
to finish the this post, here is a quick comparison between oracle and postgres for the things I noticed right now:
oracle | postgres |
---|---|
default transaction isolation level: read committed | default transaction isolation level: read committed |
pfile/spfile | postgresql.conf |
binary parameter file: yes | binary parameter file: no, but reload function of the server |
multiplexed control files | one pg_control directory containing one control-file |
create database … | initdb |
location of the archived logs: dynamic | pg_xlog |
size of the redologs: dynamic | must be specified once when building postgres and can not be changed afterwards |
system/sysaux, data dictionary | base and global directories |
listener/sqlnet configuration | postgresql.conf, pg_hba.conf |
datafiles | symlinks in pg_tblspc |
sqlplus | psql |
one to one relation of instance and database ( except for rac ) | one cluster can serve multiple databases |
select * from v$instrance; | SELECT version(); |
show user; | select current_user; |
select 1 +3 from dual; | select 1 + 3; |
shutdown modes: normal, immediate, transactional, abort | shutdown modes: smart, immediate, fast |
$ORACLE_SID | $PGDATABASE |
alter session… | SET…TO… |