why not take a look at postgresql ? (1): installation and basics

June 21, 2012 — Leave a comment

from time to time it is interesting to see how other databases are implemented and as postgresql seems to be one of the most widely used open source databases I decided to have a look at it. the result will be some posts which describe the very basics one needs to know to start working with this database system. please keep in mind that I am not an experienced postgres admin nor developer so some statements may not be totally correct or even misleading. just drop me an email or post a comment if you find any wrong statements or think something is missing to complete the picture.

if you are running a rpm based linux distribution the easiest way to install postgres is to include the postgres yum repository ( if you are on a debian based distribution there are apt repositories, too, just check the download section on the postgresql website )

to avoid any conflicts with current repositories it may be a good idea to exclude the postgres packages provided by your distribution. for this you need to add “exclude=postgresql*” to the sections of your repositories. here is an example:

cat CentOS-Base.repo
# CentOS-Base.repo
# The mirror system uses the connecting IP address of the client and the
# update status of each mirror to pick mirrors that are updated to and
# geographically close to the client. You should use this for CentOS updates
# unless you are manually picking other mirrors.
# If the mirrorlist= does not work for you, as a fall back you can try the
# remarked out baseurl= line instead.
name=CentOS-$releasever - Base
#released updates
name=CentOS-$releasever - Updates
#additional packages that may be useful
name=CentOS-$releasever - Extras
#additional packages that extend functionality of existing packages
name=CentOS-$releasever - Plus
#contrib - packages by Centos Users
name=CentOS-$releasever - Contrib

now you can safely add the postgresql repository to your yum:

wget http://yum.postgresql.org/9.1/redhat/rhel-6-x86_64/pgdg-centos91-9.1-4.noarch.rpm
rpm -ihv pgdg-centos91-9.1-4.noarch.rpm

this will download the repository rpm and, once installed, create the yum repo file:

ls -la /etc/yum.repos.d/
total 24
drwxr-xr-x. 2 root root 4096 Jun 12 07:28 .
drwxr-xr-x. 61 root root 4096 Jun 12 07:28 ..
-rw-r--r--. 1 root root 2026 Jun 12 07:22 CentOS-Base.repo
-rw-r--r--. 1 root root 657 Jun 12 07:22 CentOS-Debuginfo.repo
-rw-r--r--. 1 root root 646 Jun 12 07:22 CentOS-Media.repo
-rw-r--r--. 1 root root 436 Aug 22 2011 pgdg-91-centos.repo

before installing the software you need to decide which packages you want to install. check the Installing PostgreSQL on Red Hat Enterprise Linux / Fedora Core document for a list of the packages provided.

because I want to install the server, this is the procedure to use:

yum clean all
yum install postgresql91-libs, postgresql91 and postgresql91-server

as stated in the pdf linked above, these are the locations of the files installed:

what where
Executables /usr/bin
Libraries /usr/lib
Documentation /usr/share/doc/postgresql­x.y.z, /usr/share/doc/postgresql­x.y.z/contrib
Contrib /usr/share/pgsql/contrib
Data /var/lib/pgsql/data
Backup area /var/lib/pgsql/backup
Templates /usr/share/pgsql
Procedural Languages /usr/lib/pgsql
Development Headers /usr/include/pgsql
Other shared data /usr/share/pgsql
Regression tests /usr/lib/pgsql/test/regress

the very first thing one needs to do is to initialize the database cluster, which in postgresql terms is the database storage area on disk which contains the actual data. according to the documentation this is also called: data directory or data area. the definition the wiki referenced below uses is: “A database cluster is a collection of databases that is stored at a common file system location (the “data area”). It is possible to have multiple database clusters, so long as they use different data areas and different communication ports.”
once initialized the cluster will contain three databases:

  • postgres: the default database for utilities and users
  • template0: same as template1 but should not be used for creating any objects or users
  • template1: which serves as a template for other databases created ( everything installed in the template DB will be copied to newly created databases )

as with every database there needs to be a location on disk where all the files will be stored:

su -
mkdir -p /opt/postgres/mydb
mkdir -p /opt/postgres/log
chown postgres:postgres /opt/postgres
chown postgres:postgres /opt/postgres/mydb
chown postgres:postgres /opt/postgres/log

the installed package provides a system V init script. because I created custom locations for the data and the logfiles I adjusted the init script:

vi /etc/init.d/postgresql-9.1

to initialize the cluster on the newly created location the initdb command is used:

/usr/pgsql-9.1/bin/initdb -D /opt/postgres/mydb -U sysdba -W
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale en_US.UTF-8.
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to "english".
fixing permissions on existing directory /opt/postgres/mydb ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in /opt/postgres/mydb/base/1 ... ok
initializing pg_authid ... ok
Enter new superuser password:
Enter it again:
setting password ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.
Success. You can now start the database server using:
/usr/pgsql-9.1/bin/postgres -D /opt/postgres/mydb
/usr/pgsql-9.1/bin/pg_ctl -D /opt/postgres/mydb -l logfile start

in my opinion the “-U” and “-W” are very important. by using inidb’s default behaviour every local os-user may become the database superuser. so specifying a username and password for the superuser is a good idea.

for the scope of this and the following posts I will start with a minimal set of parameters, so overwriting the default configuration file ( this is the equivalent of oracle’s pfile ):

echo "max_connections = 100
shared_buffers = 32MB
log_destination = 'stderr'
logging_collector = on
log_directory = '/opt/postgres/log/'
log_filename = 'postgresql-%a.log'
log_file_mode = 0600
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'" > /opt/postgres/mydb/postgresql.conf

startup the service:

# check the current status of the service
service postgresql-9.1 status
# start the service
service postgresql-9.1 start

quickly use chkconfig to see if postgres will come up again once the servers is rebooted:

chkconfig --list postgresql-9.1
postgresql-9.1 0:off 1:off 2:off 3:off 4:off 5:off 6:off

no. so let’s enable this:

chkconfig postgresql-9.1 on
chkconfig --list postgresql-9.1
postgresql-9.1 0:off 1:off 2:on 3:on 4:on 5:on 6:off

installation done. do a reboot to confirm everything works as expected..

service postgresql-9.1 status
(pid 937) is running...

if you prefer to manually startup the service you can either do:

postgres -D /opt/postgres/mydb >logfile 2>&1 &


pg_ctl start -D /opt/postgres/mydb -l serverlog

and always remember to _not_ start the service as the root user.

for shutting down the server manually use:

/usr/pgsql-9.1/bin/pg_ctl stop -D /opt/postgres/mydb/ -m smart

as with the oracle database there are some shutdown options ( the “-m” switch ):

  • smart: quit after all clients have disconnected
  • fast: quit directly, with proper shutdown
  • immediate: quit without complete shutdown; will lead to recovery on restart

you can directly send the signals to the server process, too, if you prefer:

  • SIGTERM = smart
  • SIGINT = fast
  • SIGQUIT = immediate

for exmaple:

kill -INT `head -1 /opt/postgres/mydb/postmaster.pid`

as admins tend to be lazy let’s simplify life and define the environment and some aliases:

echo "PGDATA=/opt/postgres/mydb
alias pgstart='pg_ctl start -D $PGDATA -l $PGLOG'
alias pgstop='pg_ctl stop -D $PGDATA -m fast'
alias pgreload='pg_ctl reload -D $PGDATA'
alias pgrestart='pg_ctl restart -D $PGDATA -m fast'
alias pgstatus='pg_ctl status'
alias viparam='vi $PGDATA/postgresql.conf'
alias tlog='tail -1000f $PGLOGDIR/postgresql-Tue.log'" >> ~/.bashrc
. ~/.bashrc

if you miss the kernel parameters you may check the documentation for the recommendations ( the default parameters should be sufficient for a test installation ).

might be a good idea to prevent the OOM symptom mentioned in the documentation by setting the parameter in question :) :

sysctl -w vm.overcommit_memory=2

oracle uses various kinds of processes for different tasks. how does postgresql handle this ? there is a wiki on the postgres website especially for oracle dbas which describes the processes involved:

  • Master process – launches the other processes, background and session processes.
  • Writer process – background process that coordinates database writes, log writes and checkpoints.
  • Stats collector process – background process collecting information about server activity.
  • User session processes.

additionally you will see processes like this:

  • Wal writer process – background process for writing write ahead logs ( redo )
  • autovacuum launcher process – launcher for processes that scan tables to release memory/space and prevents loss of data. vacuum processes are also used to clean-up any unused undo data which is present in the table’s files. there are no separate undo/rollback segments as in oracle

if you take a look at the data area ( /opt/postgres/mydb in my case ) you will see various files and directories. the wiki mentioned above provides a clean description what all the files and directories are for:

  • postgresql.conf – Parameter or main server configuration file.
  • pg_hba.conf – Client authentication configuration file.
  • pg_ident.conf – Map from OS account to PostgreSQL account file.

The cluster subdirectories:

  • base – Subdirectory containing per-database subdirectories
  • global – Subdirectory containing cluster-wide tables
    • pg_auth – Authorization file containing user and role definitions.
    • pg_control – Control file.
    • pg_database – Information of databases within the cluster.
  • pg_clog – Subdirectory containing transaction commit status data
  • pg_multixact – Subdirectory containing multitransaction status data (used for shared row locks)
  • pg_subtrans – Subdirectory containing subtransaction status data
  • pg_tblspc – Subdirectory containing symbolic links to tablespaces
  • pg_twophase – Subdirectory containing state files for prepared transactions
  • pg_xlog – Subdirectory containing WAL (Write Ahead Log) files

to finish the this post, here is a quick comparison between oracle and postgres for the things I noticed right now:

oracle postgres
default transaction isolation level: read committed default transaction isolation level: read committed
pfile/spfile postgresql.conf
binary parameter file: yes binary parameter file: no, but reload function of the server
multiplexed control files one pg_control directory containing one control-file
create database … initdb
location of the archived logs: dynamic pg_xlog
size of the redologs: dynamic must be specified once when building postgres and can not be changed afterwards
system/sysaux, data dictionary base and global directories
listener/sqlnet configuration postgresql.conf, pg_hba.conf
datafiles symlinks in pg_tblspc
sqlplus psql
one to one relation of instance and database ( except for rac ) one cluster can serve multiple databases
select * from v$instrance; SELECT version();
show user; select current_user;
select 1 +3 from dual; select 1 + 3;
shutdown modes: normal, immediate, transactional, abort shutdown modes: smart, immediate, fast
alter session… SET…TO…

No Comments

Be the first to start the conversation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s