Archives For November 30, 1999

just noticed another difference between postgres and oracle:

oracle:

SQL> create table t1 ( a number(10,2 ) );
Table created.
SQL> create view v1 as select a from t1;
View created.
SQL> alter table t1 modify ( a number(11,2));
Table altered.

postgresql:

[postgres] > create table t1 ( a numeric(10,2) );
CREATE TABLE
[postgres] > create view v1 as select a from t1;
CREATE VIEW
[postgres] > alter table t1 alter column a type numeric(11,2);
ERROR:  cannot alter type of a column used by a view or rule
DETAIL:  rule _RETURN on view v1 depends on column "a"

the same is true if you want to drop a table which has a view defined on it:
oracle:

SQL> drop table t1;
Table dropped.

postgres:

[postgres] > drop table t1;
ERROR:  cannot drop table t1 because other objects depend on it
DETAIL:  view v1 depends on table t1
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

when working with oracle there is exactly one numeric data type one may use for storing numbers ( and that’s number ). in postgresql there are ten. what happened when we migrated an oracle database to postgresql was, that a source table in oracle took much less space than the same table in postgresql. thanks to some people on the pgsql-performance mailing list one of the reasons for this was, that the wrong numeric data type was chosen for storing the integers in postgresql.

a simple test-case:

drop table datatype1;
drop table datatype2;
create table datatype1
( a numeric ( 8,0 )
, b numeric ( 8,0 )
);
insert into datatype1 ( a, b )
       values ( generate_series ( 1, 10000000 )
              , generate_series ( 1, 10000000 )
              );
create index idatatype1_1 on datatype1 ( a );
create index idatatype1_2 on datatype1 ( b );
create table datatype2
( a int
, b int
);
insert into datatype2 ( a, b )
       values ( generate_series ( 1, 10000000 )
              , generate_series ( 1, 10000000 )
             );
create index idatatype2_1 on datatype2 ( a );
create index idatatype2_2 on datatype2 ( b );
analyze verbose datatype1;
analyze verbose datatype2;
select pg_size_pretty ( pg_relation_size ( 'datatype1' ) );
 pg_size_pretty 
----------------
 422 MB
(1 row)
select pg_size_pretty ( pg_relation_size ( 'datatype2' ) );
 pg_size_pretty 
----------------
 346 MB
(1 row)

for these two little tables the difference is about 76mb. depending on the statements this is 76mb more that needs to be scanned and this can have great impacts on performance. surprisingly, at least for me, this is not true for the indexes:

select pg_size_pretty ( pg_relation_size ( 'idatatype1_1' ) );
 pg_size_pretty 
----------------
 214 MB
(1 row)
select pg_size_pretty ( pg_relation_size ( 'idatatype2_1' ) );
 pg_size_pretty 
----------------
 214 MB
(1 row)
select pg_size_pretty ( pg_relation_size ( 'idatatype2_1' ) );
 pg_size_pretty 
----------------
 214 MB
(1 row)
select pg_size_pretty ( pg_relation_size ( 'idatatype2_2' ) );
 pg_size_pretty 
----------------
 214 MB
(1 row)

so, it’s worth to keep an eye on which data types to use for numbers …

we are currently in the process of moving an oracle data warehouse to postgresql and had some fun on how oracle handles dates in contrast to postgresql. in oracle, if you subtract one date from another you’ll get an integer:

select to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy') diff from dual;
      DIFF
----------
         4

if you do the same in postgresql you’ll get an interval:

select to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy') diff;
  diff
--------
 4 days
(1 row)

the issue was, that a view used this in a where clause and did some calculations based on the return of the expression. obviously this failed when we created the view in postgresql. the trick was to use the to_char function on the interval:

select to_char ( to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy'), 'DD' ) diff;                                      
diff
------
 04

… we thought :) but the result can still not be used to do some calculations:

select to_char ( to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy'), 'DD' ) + 8;
ERROR:  operator does not exist: text + integer
LINE 1: ...m.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy'), 'DD' ) + 8;
                                                                   ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

another conversion was necessary to make this work:

select cast ( to_char ( to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy'), 'DD' ) as int ) + 8;
 ?column?
----------
       12
(1 row)

lessons learned ? oracle is doing implicit type conversions silently in the background. postgresql does not …

another example:

oracle:

SQL> create table t ( a number );
Table created.
SQL> insert into t values ( 20120101 );
1 row created.
SQL>  select to_date(a,'yyyymmdd') from t;
TO_DATE(A,'YYYYMMDD'
--------------------
01-JAN-2012 00:00:00

postgresql:

create table t ( a number );
CREATE TABLE
insert into t values ( 20120101 );
INSERT 0 1
select to_date(a,'yyyymmdd') from t;
ERROR:  function to_date(numeric, unknown) does not exist
LINE 1: select to_date(a,'yyyymmdd') from t;
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

while learning more about postgresql I came across the default case in postgresql ( it is lower case ). so, when querying the dictionary/catalog you’ll have to provide the lower case names to get any results:

postgres=# create table test1 ( a numeric );
CREATE TABLE
postgres=# select relname from pg_class where relname = 'TEST1';
 relname 
---------
(0 rows)
postgres=# select relname from pg_class where relname = 'test1';
 relname 
---------
 test1
(1 row)

in oracle you’ll need to use upper case by default:

SQL> create table test1 ( a number );
Table created.
SQL> select table_name from dba_tables where table_name = 'TEST1';
TABLE_NAME
------------------------------
TEST1
SQL> select table_name from dba_tables where table_name = 'test1';
no rows selected
SQL> 

if you want postgresql to respect the case when creating objects you’ll need to put double quotes around the names:

postgres=# create table "TEST2" ( a numeric );
CREATE TABLE
postgres=# select relname from pg_class where relname = 'TEST2';
 relname 
---------
 TEST2
(1 row)
postgres=# select relname from pg_class where relname = 'test22';
 relname 
---------
(0 rows)

same in oracle:

SQL> create table "test2" ( a number );
Table created.
SQL> select table_name from dba_tables where table_name = 'test2';
TABLE_NAME
------------------------------
test2
SQL> select table_name from dba_tables where table_name = 'TEST2';
no rows selected

knowing this, is it possible to create identical tables which just differ in the case of their name ? :

postgresql:

postgres=# create table test3 ( a numeric );
CREATE TABLE
postgres=# create table "Test3" ( a numeric );
CREATE TABLE
postgres=# create table "TesT3" ( a numeric );
CREATE TABLE
postgres=# select relname from pg_class where upper(relname) like 'TEST3%';
 relname 
---------
 Test3
 TesT3
 test3
(3 rows)

not an issue with postgresql. what about oracle ?

SQL> create table test3 ( a number );
Table created.
SQL> create table "Test3" ( a number );
Table created.
SQL> create table "TesT3" ( a number );
Table created.
SQL> select table_name from dba_tables where upper(table_name) like 'TEST3%';
TABLE_NAME
------------------------------
TesT3
Test3
TEST3

same behaviour. If someone had asked me if this is possible in oracle before, I would have said: no, definitely not. lessons learned ? :)

going further: what about constraint names ?
in postgresql:

postgres=# alter table test3 add constraint c1 check ( a is not null );
ALTER TABLE
postgres=# alter table test3 add constraint "C1" check ( a > 5 );
ALTER TABLE
postgres=# select conname,consrc from pg_constraint where upper(conname) = 'C1';
 conname |       consrc       
---------+--------------------
 c1      | (a IS NOT NULL)
 C1      | (a > (5)::numeric)

ok, this is consistent. what about oracle ? :

SQL> alter table test3 add constraint c1 check ( a is not null );
Table altered.
SQL> alter table test3 add constraint "C1" check ( a > 5 );
alter table test3 add constraint "C1" check ( a > 5 )
                                *
ERROR at line 1:
ORA-02264: name already used by an existing constraint

what about indexes ? postgresql:

postgres=# create index i1 on test3(a);
CREATE INDEX
postgres=# create index "i1" on test3(a);
ERROR:  relation "i1" already exists
postgres=# create index "I1" on test3(a);
CREATE INDEX
postgres=# select indexname,indexdef from pg_indexes where upper(indexname) = 'I1';
 indexname |                  indexdef                  
-----------+--------------------------------------------
 i1        | CREATE INDEX i1 ON test3 USING btree (a)
 I1        | CREATE INDEX "I1" ON test3 USING btree (a)
(2 rows)

oracle:

SQL> create index i1 on test3 ( a );
Index created.
SQL> create index "i1" on test3 ( a );
create index "i1" on test3 ( a )
                            *
ERROR at line 1:
ORA-01408: such column list already indexed

as oracle checks if an index is defined on the same column(s) this is not possible. slightly modified test:

SQL> alter table test3 add ( b number );
Table altered.
SQL> create index "I1" on test3 ( b );
create index "I1" on test3 ( b )
            *
ERROR at line 1:
ORA-00955: name is already used by an existing object

still not possible.

I did not check all the objects but it seems that oracle is not as consistent as postgresql in this case.

this is work in progress, but shall show the similarities and differences between postgresql and oracle in regards to implementing schemas and code. I will try to add more and more things in the future and update this post and the samples accordingly. documentation in the scripts is not very well at the moment but it should be enough to start.

for now the samples include:

  • tables: standard columns and arrays
  • constraints: primary keys, foreign keys, check constraints
  • triggers
  • sequences
  • indexes
  • views
  • loading blobs / clobs
  • plsql packages -> pgsql
  • materialzed views
  • partitioning
  • anonymous plsql / pgplsql blocks

after all these postgresql posts I thought it’s time to look at same really cool features postgresql offers but oracle lacks. of course oracle has plenty of features other databases don’t provide, but this is true the other way round, too.

psql – sqlplus

the more I use the psql utility ( postgresql’s equivalent to sqlplus ) the more I love it. this tiny little tool has so many wonderful features that it is hard to give a complete overview. so, here are my favorites:

one of the best features psql offers are the various shortcuts one can use to query the catalog ( data dictionary ), control the output, display help for the various commands and move data in and out of the database.

first example: to list the available views in oracle you have to query the data dictionary ( either dict or v$fixed_view ). in psql, it’s as easy like this:

postgres=# \dvS
                       List of relations
   Schema   |              Name               | Type |  Owner   
------------+---------------------------------+------+----------
 pg_catalog | pg_available_extension_versions | view | postgres
 pg_catalog | pg_available_extensions         | view | postgres
 pg_catalog | pg_cursors                      | view | postgres
...

if you want even more information ( size and description in this example ) a “+” can always be appended:

postgres=# \dvS+
                                   List of relations
   Schema   |              Name               | Type |  Owner   |  Size   | Description 
------------+---------------------------------+------+----------+---------+-------------
 pg_catalog | pg_available_extension_versions | view | postgres | 0 bytes | 
 pg_catalog | pg_available_extensions         | view | postgres | 0 bytes | 
 pg_catalog | pg_cursors                      | view | postgres | 0 bytes | 

you can even use wildcards if you know parts of an object name but are not sure about the exact name:

postgres=# \dvS *index* 
                   List of relations
   Schema   |          Name          | Type |  Owner   
------------+------------------------+------+----------
 pg_catalog | pg_indexes             | view | postgres
 pg_catalog | pg_stat_all_indexes    | view | postgres
 pg_catalog | pg_stat_sys_indexes    | view | postgres
 pg_catalog | pg_stat_user_indexes   | view | postgres
 pg_catalog | pg_statio_all_indexes  | view | postgres
 pg_catalog | pg_statio_sys_indexes  | view | postgres
 pg_catalog | pg_statio_user_indexes | view | postgres
(7 rows)

the same is true for tables (\dt), functions (\df), tablespaces (\db) and all the other objects available. no need to create scripts for querying frequent used information.

another big plus is the integrated help. let’s assume you are not sure about how to exactly create an index. perhaps you do not need that so often that you remember the syntax. no need to search the documentation:

postgres-# \h CREATE INDEX        
Command:     CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table [ USING method ]
    ( { column | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ WITH ( storage_parameter = value [, ... ] ) ]
    [ TABLESPACE tablespace ]
    [ WHERE predicate ]

that’s really cool.

editing functions directly ? not a problem with psql. let’s create a simple function ( this one is from the documentation ):

CREATE FUNCTION add(integer, integer) RETURNS integer
    AS 'select $1 + $2;'
    LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;

if you know want to directly edit this function, just do:

postgres=# \ef add(integer, integer)

… change it, save it, execute it and you’re done.

what about getting data out of postgresql ? maybe there’s a requirement to load data to a data warehouse which is a database from another vendor. flat files almost always provide a robust way for transporting data. and this is pretty easy and very well integrated with psql. the command in question is the “copy” command.
exporting a table to a file is not a big deal:

postgres=# copy myschema.customers to '/tmp/customers.log';                 
COPY 20000
postgres=# \! head /tmp/customers.log
1	VKUUXF	ITHOMQJNYX	4608499546 ABC Way	\N	QSDPAGD	SD	24101	US	1	ITHOMQJNYX@abc.com	4608499546	11979279217775911	2012/03	user1	password	55	100000	M

that’s it ( in the simplest way ). there are some more switches ( csv, headers, delimiter, etc. ) to fine tune your export, just use the integrated help to see what’s around:

\h copy

loading data back into postgresql ? same command, the other way around:

postgres=# create table myschema.customers2 ( like myschema.customers );  
CREATE TABLE
postgres=# copy myschema.customers2 from  '/tmp/customers.log';
COPY 20000
postgres=# select * from myschema.customers2 limit 1;
 customerid | firstname |  lastname  |      address1       | address2 |  city   | state |  zip  | country | region |        email        |  
 phone    | creditcardtype |    creditcard    | creditcardexpiration | username | password | age | income | gender 
------------+-----------+------------+---------------------+----------+---------+-------+-------+---------+--------+---------------------+--
----------+----------------+------------------+----------------------+----------+----------+-----+--------+--------
          1 | VKUUXF    | ITHOMQJNYX | 4608499546 ABC Way |          | QSDPAGD | SD    | 24101 | US      |      1 | ITHOMQJNYX@abc.com | 4
608499546 |              1 | 1979279217775911 | 2012/03              | user1    | password |  55 | 100000 | M
(1 row)

easy, isn’t it?

and by the way: tired of writing “select * from some_table” all the time ? use the “table” command to query a table:

postgres=# table myschema.customers;
 customerid | firstname |  lastname  |      address1       | address2 |  city   | state |  zip  |   country    | region |        email      
  |   phone    | creditcardtype |    creditcard    | creditcardexpiration | username  | password | age | income | gender 
------------+-----------+------------+---------------------+----------+---------+-------+-------+--------------+--------+-------------------
--+------------+----------------+------------------+----------------------+-----------+----------+-----+--------+--------

if you are used to bash or some other shells which provide similar functionality you for sure use the command history ( arrow up and down ). it’s integrated with psql, too, out of the box ( yes, I know you may use rlwrap with sqlplus, but you still have to do some extra work for getting this to work ). and as the various shells have their startup control file, there is one for psql, too, which usually is located in the home directory of the os user and is called “.psqlrc”. like the login.sql and glogin.sql files in oracle you can define your setup here. but you can do even more. psql provides the ability to define variables, e.g. :

\set waits 'SELECT pg_stat_activity.procpid, pg_stat_activity.current_query, pg_stat_activity.waiting, now() - pg_stat_activity.query_start  as "totaltime", pg_stat_activity.backend_start FROM pg_stat_activity WHERE pg_stat_activity.current_query !~ \'%IDLE%\'::text AND pg_stat_activity.waiting = true;;'

…defines a variable which contains a sql statement for displaying current waits in the database. once defined you can easily reference it:

postgres=# :waits
 procpid | current_query | waiting | totaltime | backend_start 
---------+---------------+---------+-----------+---------------
(0 rows)

put this in your “.psqlrc” file and you’ll have your variable available all the time. really cool.

indexing

in postgresql there is the concept of a partial index. that means you can create an index on a subset of a table’s data. this is not possible in oracle. let’s do an example:

assume we a have a table which contains an ever increasing number, an entry for each hour of the year and a true/false flag for each row ( postgresql allows columns to be defined as boolean, cool ):

create table t1 ( a integer, b timestamp with time zone, c boolean ); 

before creating a partial index let’s populate the table with some test-data. this also introduces the generate_series function which is a very easy and effective way to generate some data:

insert into t1 ( a, b, c )
       values ( generate_series ( 1, 8761 )
              , generate_series ( timestamptz ( to_date('01.01.2012','DD.MM.YYYY') ) 
                                , timestamptz ( to_date('31.12.2012','DD.MM.YYYY') )
                                , interval '1h' 
                                )
              , 'T'
              );
update t1 set c = 'F' where mod(a,111) = 0;

now, assume there is a report which runs at the end of every month which is only interested in data which has the false flag set on column c ( maybe to to get all customers who did not pay their receipt :) ). you could create a normal index on column c, but you could also create a partial index for this:

create index i1 on t1 ( c ) where not c;

this will:
a) greatly reduce the size of the index
b) only index the data which fulfills the expression
c) exactly provides the data the report ask for

let’s see what explain tells about the statement the report uses:

indx=# analyze verbose t1;
INFO:  analyzing "public.t1"
INFO:  "t1": scanned 57 of 57 pages, containing 8761 live rows and 0 dead rows; 8761 rows in sample, 8761 estimated total rows
ANALYZE
indx=# explain analyze select * from t1 where not c;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Index Scan using i1 on t1  (cost=0.00..12.85 rows=78 width=13) (actual time=0.016..0.056 rows=78 loops=1)
   Index Cond: (c = false)
 Total runtime: 0.098 ms
(3 rows)

exactly what I wanted. so, if you know the statements in your database and you know your data ( and you probably should :) ) partial indexes may provide a great opportunity.

granting and revoking

for sure you had the situation where you needed to grant select on all tables in a schema to another user. in oracle one would create a role, grant select for every single table in the source schema to that role and then grant the role to the target user ( if you do not want to grant to public, which is a bad idea anyway ).

in postgresql this is much easier. let’s setup a simple test case:

postgres=# create role usr1 login password 'usr1';
CREATE ROLE
postgres=# create role usr2 login password 'usr2';
CREATE ROLE
postgres=# create database beer owner=usr1;
CREATE DATABASE

in oracle a schema is almost the same thing as a user. in postgresql you’ll explicitly have to create a schema. otherwise the objects will get created in the public schema:

postgres=# \c beer usr1
Password for user usr1: 
You are now connected to database "beer" as user "usr1".
beer=> create schema myschema;
CREATE SCHEMA
beer=> \dn
   List of schemas
   Name   |  Owner   
----------+----------
 myschema | usr1
 public   | postgres
(2 rows)

as the schema is available now, tables can be created in the new schema:

beer=> create table myschema.t1 ( a int );
CREATE TABLE
beer=> create table myschema.t2 ( a int );
CREATE TABLE
beer=> create table myschema.t3 ( a int );
CREATE TABLE

granting select on all the tables in the schema is as easy as:

beer=> grant usage on schema myschema to usr2;
GRANT
beer=> grant select on all tables in schema myschema to usr2;
GRANT

without the “usage” grant the user will not be able to do anything in the schema. so be sure, to grant it before granting any other privileges.

to verify it:

beer=> \c beer usr2
Password for user usr2: 
You are now connected to database "beer" as user "usr2".
beer=> select * from myschema.t1;
 a 
---
(0 rows)
beer=> select * from myschema.t2;
 a 
---
(0 rows)
beer=> select * from myschema.t3;
 a 
---
(0 rows)

you can even grant select on specific columns ( which is not possible in oracle, too ):

beer=>\c postgres postgres
postgres=# create user usr3 login password 'usr3';
CREATE ROLE
postgres=# \c beer usr1
Password for user usr1: 
You are now connected to database "beer" as user "usr1".
beer=> grant usage on schema myschema to usr3;
beer=> grant select (a) on table myschema.t1 to usr3;
GRANT
beer=> \c beer usr3
Password for user usr3: 
You are now connected to database "beer" as user "usr3".
beer=> select a from myschema.t1;
 a 
---
(0 rows)

this can be very handy if you want to hide some columns or give select to just a few and do not want to create views on top of the table.

creating objects

creating objects ? you might think there is nothing special here. but wait :)
in oracle, if you’d like to create a table which has exactly the same columns as another table you’d do something like this:

create table new_table as select * from source_table where 1 = 2;

in postgresql you you can do the same:

d1=# create table t1 ( a int, b int );
CREATE TABLE
d1=#  create table t2 as select * from t1 where 1=2;
SELECT 0
d1=# \d t2
      Table "public.t2"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 

but postgresql goes some steps further. for creating a similar table there is the “like” keyword. in the easiest way you would do:

d1=# create table t3 ( like t1 );
CREATE TABLE
d1=# \d t3
      Table "public.t3"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 

nothing special so far. just an easier to understand and shorter syntax ( in my opinion ). but wait, check this out:

d1=# alter table t1 add constraint uk1 unique (a);
NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit index "uk1" for table "t1"
ALTER TABLE
d1=# \d t1
      Table "public.t1"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 
Indexes:
    "uk1" UNIQUE CONSTRAINT, btree (a)
d1=# create table t4 ( like t1 INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS );
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "t4_a_key" for table "t4"
CREATE TABLE
d1=# \d t4
      Table "public.t4"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 
Indexes:
    "t4_a_key" UNIQUE CONSTRAINT, btree (a)

that’s cool. get an exact copy of the table’s definition with one command. you think there can’t be more goodies ? what about this? :

d1=# create table t5 ( a int, b int );
CREATE TABLE
d1=# create table t6 ( c date, d date );
CREATE TABLE
d1=# create table t7 ( a int, b int, c date, d date ) inherits ( t5, t6 );
NOTICE:  merging column "a" with inherited definition
NOTICE:  merging column "b" with inherited definition
NOTICE:  merging column "c" with inherited definition
NOTICE:  merging column "d" with inherited definition
CREATE TABLE
d1=# \d t7
      Table "public.t7"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 
 c      | date    | 
 d      | date    | 
Inherits: t5,
          t6

what the hell is this ? some sort of object oriented mechanism ? let’s add some data and check the results:

d1=# insert into t7 values ( 1,1,current_date,current_date);
INSERT 0 1
d1=# select * from t5;
 a | b 
---+---
 1 | 1
(1 row)
d1=# select * from t6;
     c      |     d      
------------+------------
 2012-08-22 | 2012-08-22
(1 row)
d1=# select * from t7;
 a | b |     c      |     d      
---+---+------------+------------
 1 | 1 | 2012-08-22 | 2012-08-22
(1 row)

as you can see the tables are now dependent on each other. if data is added into table t7, the data is automatically present in the underlying t5 and t6 tables. but this does not work the other way around:

d1=# insert into t5 values (2,2);
INSERT 0 1
d1=# insert into t6 values ( current_date,current_date);
INSERT 0 1
d1=# select * from t7;
 a | b |     c      |     d      
---+---+------------+------------
 1 | 1 | 2012-08-22 | 2012-08-22
(1 row)

what happens if you change the column definitions ?

d1=# alter table t5 alter column a type bigint;
ALTER TABLE
d1=# \d t5
      Table "public.t5"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | bigint  | 
 b      | integer | 
Number of child tables: 1 (Use \d+ to list them.)
d1=# \d t7
      Table "public.t7"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | bigint  | 
 b      | integer | 
 c      | date    | 
 d      | date    | 
Inherits: t5,
          t6

automatically gets propagated. really nice … I don’t have a use case for this currently, but maybe this can be useful to reduce a table’s size into smaller tables depending on the columns. would like to hear if someone has used this feature and for what purpose. feel free to post comments.

time for the last one ( for now :) ). I am pretty sure plenty of databases use sequences to generate primary keys. in postgresql you can set the default for a table’s column to get populated from a specific sequence ( not possible in oracle ):

d1=# create sequence s1;
CREATE SEQUENCE
d1=# create table t8 ( a int primary key default nextval('s1'), b int );
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t8_pkey" for table "t8"
CREATE TABLE
d1=# insert into t8 (b) values (1);
INSERT 0 1
d1=# select * from t8;
 a | b 
---+---
 1 | 1
(1 row)

for sure there are lots and lots of more cool features, but for now this shall be enough …

UPDATE: Check this post for a use case for table inheritance in the sample scripts.

just a short notice, that the postgresql people released a security update for the current version ( 9.1 ) and some of the older versions. see the announcement for more information.

postgresql, as well as oracle, heavily depends on accurate statistics for being able to provide the best execution plan for a given query ( both use a cost based optimizer ).

in postgresql the components involved for creating and executing the best execution plan are:

a, from an oracle perspective, special feature of the rewriter is, that you can create custom rules.
a simple example for a table containing three rows:

sysdba@[local]:1540/dbs200# create table t1 ( a integer, b char(20) );
CREATE TABLE
sysdba@[local]:1540/dbs200*# insert into t1 values ( 1, 'text1' );
INSERT 0 1
sysdba@[local]:1540/dbs200*# insert into t1 values ( 2, 'text2' );
INSERT 0 1
sysdba@[local]:1540/dbs200*# insert into t1 values ( 3, 'text3' );
INSERT 0 1
sysdba@[local]:1540/dbs200*# COMMIT;
COMMIT

let’s say the second row is such important that you do not want to allow changes to it. to achieve this you could create a trigger or you may create a rule:

sysdba@[local]:1540/dbs200# create rule myrule as on update to t1 where old.a = 2 do instead nothing;
CREATE RULE
sysdba@[local]:1540/dbs200# update t1 set b = 'blabla' where a=2;
UPDATE 0
sysdba@[local]:1540/dbs200*# select * from t1 where a=2;
 a |          b           
---+----------------------
 2 | text2               
(1 row)

such rules can be created for insert, update and delete statements and add additional conditions to statements on the tables in question. check the documentation for a complete description on this.

the optimizer in general does the same job as the optimizer in oracle does, except that you can not give hints to the optimizer in postgresql.

of course you can create indexes in postgresql for optimizing access to the data of interest. postgres provides four types of indexes:

  • B-tree: same as in oracle
  • hash: uses hash tables and is only available for “=” operations
  • Generalzied Search Tree ( GiST ): a special type often used for geometric or full text search purposes
  • Generalized Inverted Index ( GIN ): an index used for lists and arrays
    • a more detailed view on indexes in another post.

      as mentioned above the planner/optimizer is responsible for creating the best possible execution plan. a plan internally is a tree which consists of several sub plans. as with oracle there are different choices the planner/optimizer has available:

      • sequential scan: read the whole table and check each row for a match, same as in oracle
      • indexscan: read the index for matches and go to the table for reading the row. it is important to know that postgres always has to visit the table to check if a row is allowed to see in the current transaction ( no index only access possible ). this is because postgres saves the “undo” data in the table itself but not in the index
      • bitmap index scan: scans the index for matches, save the results in a bitmap in memory, sorts the bitmap in the order of the table ( that is sort by block numbers ) and than reads the table. this suppresses jumping between the index and the table all the time.
      • nested loop join: scan the outer table and then go the inner table, same as in oracle
      • hash join: creates a hash table from one the of the joined tables and then uses the hash values for searching the other table(s), same as in oracle
      • merge join: first sorts the joined tables ( depending on the join ) and then reads the tables in parallel

      in oracle you can either use the explain command, the dbms_xplan package, autotrace or give hints to the optimizer for displaying explain plans. in postgresql it’s the explain command. using the same table from above to display the explain plan for a simple select you would do:

      sysdba@[local]:1540/dbs200*# explain select * from t1;
                            QUERY PLAN                      
      ------------------------------------------------------
       Seq Scan on t1  (cost=0.00..16.90 rows=690 width=88)
      (1 row)
      sysdba@[local]:1540/dbs200*# 
      

      as I know the table only contains three rows, the 690 rows reported above a far from being correct. same issue as in oracle: if the statistics are not good, the plans will not be good, too.
      the cost reporting is special in postgres:
      “cost=0.00..16.90” means: 0 for the start costs ( as this is a sequential scan the results can be send immediately as results are retrieved ) and 16.9 for the end costs ( that is the whole cost for executing the plan ).
      the width column reports the size of one result, so in total there would be 690 rows * 88 bytes = 59840 bytes ( depending on the current statistics ).

      lets check the statistics of the table:

      sysdba@[local]:1540/dbs200*# select relpages,reltuples from pg_class where relname = 't1';
       relpages | reltuples 
      ----------+-----------
              0 |         0
      (1 row)

      obviously wrong, that’s why the statistics reported by explain above are wrong, too. these numbers should change if statistics are generated:

      sysdba@[local]:1540/dbs200*# analyze verbose t1;
      INFO:  analyzing "public.t1"
      INFO:  "t1": scanned 1 of 1 pages, containing 3 live rows and 0 dead rows; 3 rows in sample, 3 estimated total rows
      ANALYZE
      sysdba@[local]:1540/dbs200*# select relpages,reltuples from pg_class where relname = 't1';
       relpages | reltuples 
      ----------+-----------
              1 |         3
      (1 row)
      

      much better. what does explain report now ?:

      sysdba@[local]:1540/dbs200*#  explain select * from t1;
                          QUERY PLAN                     
      ---------------------------------------------------
       Seq Scan on t1  (cost=0.00..1.03 rows=3 width=25)
      (1 row)
      

      much better, too. wonder how the costs get calculated ? as in oracle oracle the costs are depended on the cost of a disk read on the cost for the cpu to process the rows. in postgresql there are two parameters which specify this:

      sysdba@[local]:1540/dbs200*# show seq_page_cost;
       seq_page_cost 
      ---------------
       1
      (1 row)
      sysdba@[local]:1540/dbs200*# show cpu_tuple_cost;
       cpu_tuple_cost 
      ----------------
       0.01
      (1 row)
      

      the number reported here are the default values. of course you can tweak these by specifying the parameters in the server parameter file.

      so the costs are: ( 1 * 1 ) + ( 3 * 0.01 ) = 1.03 ( one page read + three times 0.01 for processing the three rows ).

      as plans might lie because they are based on assumptions and statistics you can use “explain analyze” ( which really executes the plan ) for comparing the calculated costs against the real costs ( in oracle you can do this by passing the gather_plan_statistics hint and calling the dbms_xplan.display function with the ‘ADVANCED’ parameter ):

      sysdba@[local]:1540/dbs200*# explain analyze select * from t1;
                                               QUERY PLAN                                          
      ---------------------------------------------------------------------------------------------
       Seq Scan on t1  (cost=0.00..1.03 rows=3 width=88) (actual time=0.011..0.026 rows=3 loops=1)
       Total runtime: 0.150 ms
      (2 rows)
      

      the value reported for loops is only interesting for joins as this reports how often a particular step was executed.

      what about indexes ? let’s generate some more data for the t1 table and create an index:

      sysdba@[local]:1540/dbs200# insert into t1 select * from t1 where a = 1;
      INSERT 0 1
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 2
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 4
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 8
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 16
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 32
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 64
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 128
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 256
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 512
      sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
      INSERT 0 1024
      sysdba@[local]:1540/dbs200*# COMMIT;
      COMMIT
      sysdba@[local]:1540/dbs200# CREATE INDEX I1 ON T1(A);
      CREATE INDEX
      sysdba@[local]:1540/dbs200*# COMMIT;
      COMMIT
      sysdba@[local]:1540/dbs200# \d t1
               Table "public.t1"
       Column |     Type      | Modifiers 
      --------+---------------+-----------
       a      | integer       | 
       b      | character(20) | 
      Indexes:
          "i1" btree (a)
      Rules:
          myrule AS
          ON UPDATE TO t1
         WHERE old.a = 2 DO INSTEAD NOTHING
      

      let’s see what happens if we query for a=1 now:

      sysdba@[local]:1540/dbs200*# explain analyze select * from t1 where a = 1;
                                                   QUERY PLAN                                             
      ----------------------------------------------------------------------------------------------------
       Seq Scan on t1  (cost=0.00..40.62 rows=2048 width=25) (actual time=0.030..6.873 rows=2048 loops=1)
         Filter: (a = 1)
       Total runtime: 12.110 ms
      (3 rows)
      

      as expected the index is not used ( too many rows will match the criteria a=1 ). let’s check the table statistics and see if the cost are as expected:

      sysdba@[local]:1540/dbs200*# select relpages,reltuples from pg_class where relname = 't1';
       relpages | reltuples 
      ----------+-----------
             15 |      2050
      

      so now there are 15 pages with 2050 rows: ( 1 * 15 ) + ( 2050 * 0.01 ) = 35.5
      not really what is expected, the cost is reported higher by explain. that’s because another parameter comes into the game when there is a where clause:

      sysdba@[local]:1540/dbs200*# show cpu_operator_cost;
       cpu_operator_cost 
      -------------------
       0.0025
      (1 row)
      

      again, the whole table will be scanned, each result will be processed and additionally each result will be checked against the condition ( this is the cpu_operator_cost ), so:
      ( 1 * 15 ) + ( 2050 * 0.01 ) + ( 2050 * 0.0025 ) = 40.6250 which is almost the cost reported above.

      what happens if the index will get used ?:

      sysdba@[local]:1540/dbs200# explain analyze verbose select * from t1 where a = 3;
                                                        QUERY PLAN                                                   
      ---------------------------------------------------------------------------------------------------------------
       Index Scan using i1 on public.t1  (cost=0.00..8.27 rows=1 width=25) (actual time=0.020..0.023 rows=1 loops=1)
         Output: a, b
         Index Cond: (t1.a = 3)
       Total runtime: 0.060 ms
      (4 rows)
      

      this are two random reads ( one for the index one for the table ) which is 8, which comes from the random_page_cost parameter:

      sysdba@[local]:1540/dbs200*# show random_page_cost;
       random_page_cost 
      ------------------
       4
      (1 row)
      

      so in total it is 8 plus the overhead for the cpu.
      when it comes to joins the procedure is the same as for every other cost based rdbms: make sure the statements perform well ( and produce the right plan ) for every single table joined in the statement. if this is fine, the join will be fine, too.

      for playing with different settings postgresql provides some parameters which can be set dynamically. but be careful: as with the hints you can give to the oracle optimizer these parameters should not be used to permanently fix your plans. if the plans are wrong, check your statistics, and even more important: know your data. know your data. know your data.

as databases tend to hold sensitive information this information should be protected as much as possible. oracle provides various tools for securing and auditing the database: database firewall, audit vault, enterprise security, database vault to name a few of them ( and for most of them you’ll need a separate license ) and of course all the privileges you can assign and revoke inside the database.

Roles, Groups and Passwords

as oracle does, postgres bases it’s internal security mechanisms on users and roles. users created in postgres are valid globally, that is: not specific to a single database. this means the amount of users is the same for all databases. privileges can also be assigned to groups, which can be granted to users. if you are used to the oracle terms be aware that:

  • users in oracle are called roles in postgres
  • roles in oracle are called groups in postgres
  • sometimes the word role is used for both, users and groups, in postgres ( that is login roles, which are users, and nologin roles, which are groups )

to create a new login role in the database the “create role” command is used:

CREATE ROLE "user1" LOGIN;
CREATE ROLE "user2" LOGIN PASSWORD 'user2';
# create a superuser
CREATE ROLE "user3" LOGIN PASSWORD 'user3' SUPERUSER;
# create a user and grant the privilege to create roles
CREATE ROLE "user4" LOGIN PASSWORD 'user4' CREATEROLE;
# create a user allowed to create databases
CREATE ROLE "user5" LOGIN PASSWORD 'user5' CREATEDB;
# create a user allowed to create databases and password validity
CREATE ROLE "user6" LOGIN PASSWORD 'user6' CREATEDB VALID UNTIL '2012-10-01';
# create a user and limit her amount of connections
CREATE ROLE "user7" LOGIN PASSWORD 'user7' CONNECTION LIMIT 2;

be careful if you create users like above and provide the password as normal string. depending on your server configuration the passwords will be visible in the server’s logfile and the psql history:

LOG:  statement: CREATE ROLE "user1" LOGIN;
LOG:  statement: CREATE ROLE "user2" LOGIN PASSWORD 'user2';
LOG:  statement: CREATE ROLE "user3" LOGIN PASSWORD 'user2' SUPERUSER;
LOG:  statement: CREATE ROLE "role";
LOG:  statement: CREATE ROLE "user4" LOGIN PASSWORD 'user4' CREATEROLE;

as postgres internally encrypts the passwords with md5 you can prevent this by providing the encrypted password when creating users:

CREATE USER someuser LOGIN PASSWORD 'md572947234907hfasf3';

to get the encrypted password out of the database use the pg_authid view:

SELECT rolname, rolpassword FROM pg_authid;
 rolname  |             rolpassword             
----------+-------------------------------------
 sysdba   | md5448a3ec0e7a2689f0866afca52f91e13
 user1    | 
 user2    | md572881e285cdb0f9370dcdf1db0d9a869
 user3    | md53b24544e8f4b2a20f4bcca02a35df8fb
 user4    | md547e1c205dd73d4c06405bd08d255e320
 user5    | md51dc34834df4da4804236eb250118fb41
 user6    | md5bdf2912fce3ee3f6657bacc65527c7bd
 user7    | md5c5068c076d70d192c7f205a9bba4c469
 role1    | 

to create a group ( or role in oracle terms ) just skip the login attribute:

CREATE ROLE "role1";

granting groups to users:

GRANT ROLE1 TO USER1;

or

GRANT ROLE1 TO USER1 WITH ADMIN OPTION;

you can either use the psql shortcut to list the roles in the database server:

\du
                             List of roles
 Role name |                   Attributes                   | Member of 
-----------+------------------------------------------------+-----------
 role1     | Cannot login                                   | {}
 sysdba    | Superuser, Create role, Create DB, Replication | {}
 user1     |                                                | {role1}
 user2     |                                                | {}
 user3     | Superuser, Replication                         | {}
 user4     | Create role                                    | {}
 user5     | Create DB                                      | {}
 user6     | Create DB                                      | {}
 user7     | 2 connections                                  | {}

… or you may use the pg_roles view:

SELECT rolname,rolsuper,rolcreatedb,rolconnlimit,rolvaliduntil FROM pg_roles;
 rolname  | rolsuper | rolcreatedb | rolconnlimit |     rolvaliduntil      
----------+----------+-------------+--------------+------------------------
 sysdba   | t        | t           |           -1 | 
 user1    | f        | f           |           -1 | 
 user2    | f        | f           |           -1 | 
 user3    | t        | f           |           -1 | 
 user4    | f        | f           |           -1 | 
 user5    | f        | t           |           -1 | 
 user6    | f        | t           |           -1 | 2012-10-01 00:00:00+02
 user7    | f        | f           |            2 | 
 role1    | f        | f           |           -1 | 

to delete a role, just drop it:

DROP ROLE ROLE1;
# or to suppress error messages in case the role does not exist 
DROP ROLE IF EXISTS ROLE1;

to delete everything owner by a specific role:

DROP OWNED BY USER1;

you can even re-assign all objects from one role to another:

REASSIGN OWNED BY USER1 TO USER2;

granting / revoking privileges on objects is similar than in oracle with a few exceptions. if you want to grant execute on a function you’ll have to specify the parameters, too:

GRANT EXECUTE ON FUNCTION1 ( int, int ) TO USER1; (

you can grant a privilege on a whole schema ( tables, sequences and functions ) :

GRANT SELECT ON ALL TABLES IN SCHEMA A TO USER2;

you can grant privileges on a whole database:

GRANT ALL PRIVILEGES ON DATABASE DBS200 TO USER2;

you can change the owner of objects:

ALTER TABLE TEST1 OWNER TO USER2;

if you want to create objects in a separate schema ( public is the default ) you’ll have to create it first:

CREATE SCHEMA SCHEMA1;
CREATE TABLE SCHEMA1.TABLE1 ( A INTEGER );

specify the search path to avoid the schema in your commands:

SHOW search_path;
SET search_path TO schema1,public;

to display privileges either use the psql shortcut:

\z
                                    Access privileges
 Schema  |        Name        | Type  |   Access privileges   | Column access privileges 
---------+--------------------+-------+-----------------------+--------------------------
 public  | pg_stat_statements | view  | sysdba=arwdDxt/sysdba+| 
         |                    |       | =r/sysdba             | 
 schema1 | table1             | table |                       | 
(2 rows)

or query the information schema for a specific object:

SELECT * FROM information_schema.table_privileges WHERE table_name = 'TABLE1';

Client Connections

in postgres there is one file which controls if and how clients connect to the database server. the file is called “pg_hba.conf” and is located in the data area of the database server. initdb automatically creates this file when the cluster is initialized.

in my case the file looks like this:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
local   all             all                                     md5
host    all             all             ::1/128                 md5
local   replication     sysdba                                md5
host    replication     sysdba        127.0.0.1/32            md5

the first column is the type, which can be one of:

  • local: this is for unix domain sockets
  • host: this is for tcp/ip
  • hostssl: this is for ssl over tcp/ip
  • hostnossl: this is for tcp/ip connections which do not use ssl

the second and third columns specifies the database name and user the configuration is valid for. by specifying addresses you can enable individual hosts or networks to connect to the database server. the last column specifies the authentication method, which can be one of:

  • trust: this effectively disables authentication and should not be used
  • reject: rejects all connections which are valid for the entry
  • md5: password authentication using md5
  • password: password authentication ( clear text )
  • ident: use the os to authenticate the user

additionally to the methods above postgres provides support for pam, kerberos, gssapi, sspi for windows, radius and ldap. all for free, in contrast to oracle.

in general one should at least use md5 to provide minimum security. trust and ident should not be used in production environments.

check the documentation for a more detailed description ….

Auditing

for auditing in postgresql you’ll need to create triggers and/or functions. there is no out of the box module which you can use. but you can use several server parameters to log information to the server’s logfile.
A quick check on pgfoundry listed a project called “audittrail” which is still in beta status and the last update was in 2007.
in the end, you’ll have to spend more work on auditing in postgresql than in oracle. this may be a drawback for enterprise installations …

as oracle does, postgresql is controlled by plenty of parameters. not all of them need to be adjusted but some of them are important to understand. so I will setup a new postgresql database with a more detailed view on the parameters one should consider when going live ( I will not go into detail on how to layout the filesystems, the focus is on the parameters ).

initially I will use the same setup as in the first post but adjust the most important parameters.

initial setup:

pg_ctl stop -D /opt/postgres/mydb -m fast
rm -rf /opt/postgres/mydb
initdb -D /opt/postgres/mydb -U sysdba -W
rm -f /opt/postgres/mydb/postgresql.conf

I deleted the sample configuration as I want to specify the most important parameters to fit my needs.

log messages are essential for the dba so one of the first things to do is to define where and how much the database server should log. there are several parameters which control this in postgresql:

parameter description
log_destionation tells the server where to write logs to, can be one of: stderr, syslog, eventlog, cvslog
logging_collector if on, the server will start its own logging process for catching logs from stderr and writing them the a log file
log_directory the directory where the log files should go to
log_filename the filename to use for the server log ( several place holders may be used to specify the format )
log_rotation_age specifies the amount of time before rotating the log file
log_rotation_size specifies the size the log file can reach before rotating the log file
log_truncate_on_rotation if on, rotated log files will be overwritten
client_min_messages controls how many and what messages are returned to the client (DEBUG5-DEBUG1,LOG;NOTICE,WARNING,ERROR,FATAL,PANIC)
log_min_messages controls how many and what messages are written to the log (DEBUG5-DEBUG1,LOG;NOTICE,WARNING,ERROR,FATAL,PANIC)
log_autovacuum_min_duration the time a vacuum opration may consume until it is reported in the logfile
log_error_verbosity control how detailed the output to the log file will be ( terse, default, verbose )
log_min_error_statement additionally reports the statement that produced an error (DEBUG5-DEBUG1,LOG;NOTICE,WARNING,ERROR,FATAL,PANIC)
log_min_durations_statement additionally reports statements which tool longer that specified
log_checkpoints if on, logs checkpoints the server’s log file
log_connections logs each new database connection to the log file
log_disconnections logs each disconnection to the log file
log_duration logs the duration of every sql statement
log_hostname converts ip addresses to hostnames in the log file
log_line_prefix specifies the prefix for each line reported to the log ( various place holders available )
log_lock_waits if on, every process waiting longer than deadlock_timeout for a lock will be reported
log_statement specifies if and which sql statements will be written to the log file ( none, ddl, mod, all )
log_temp_files specifies if log entry will be written each time a temporary file gets deleted
log_timezone specifies the timezone for the log entries

as you can see, the dba is given much more control about logging than in oracle. it clearly depends on the database and application what should be logged. to start, this set should be appropriate:

export PARAMFILE=/opt/postgres/mydb/postgresql.conf
echo "###### logging settings" >> $PARAMFILE
echo "logging_collector=on" >> $PARAMFILE
echo "log_truncate_on_rotation=on" >> $PARAMFILE
echo "log_filename='postgresql-%a.log'" >> $PARAMFILE
echo "log_rotation_age='8d'" >> $PARAMFILE
echo "log_line_prefix='%m - %l - %p - %u@%d '" >> $PARAMFILE
echo "log_directory='/var/log/'" >> $PARAMFILE
echo "log_min_messages='WARNING'" >> $PARAMFILE
echo "log_autovacuum_min_duration=360s" >> $PARAMFILE
echo "log_error_verbosity=default" >> $PARAMFILE
echo "log_min_error_statement=ERROR" >> $PARAMFILE
echo "log_duration_statement=5min" >> $PARAMFILE
echo "log_checkpoints=on" >> $PARAMFILE
echo "log_statement=ddl" >> $PARAMFILE
echo "client_min_messages='WARNING'" >> $PARAMFILE

once having specified the log settings it is time to think about the memory requirements. compared to the oracle settings there are not too much parameters to specify here:

parameter description
shared_buffers controls the amount of shared memory available to the whole database cluster. the initial size on my box is 32M which is rather small.
temp_buffers controls the amount of buffers used for temporary tables _per_ session.
work_mem the amount of memory used for sort and hash operations per operation
maintenance_work_mem the amount of memory used for maintenance operations such as ACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY

although these settings strongly depend on the database and application requirements and the serves hardware this could be a good start:

echo "###### memory settings" >> $PARAMFILE
echo "shared_buffers=256MB" >> $PARAMFILE
echo "temp_buffers=16MB" >> $PARAMFILE
echo "work_mem=4MB" >> $PARAMFILE
echo "maintenance_work_mem=16MB" >> $PARAMFILE

the next point to think about is the wal ( write ahead log ). as the wal files are essential for consistency and a production system never should go without archived logs these settings are critical. postgresql offers various parameters for controlling this ( only the most important here ):

parameter description
fsync should always be on ( default ) as this controls that comitted transactions are guaranteed to be written to disk
wal_buffers size of the wal buffers inside the databases’ shared memory ( comparable to the log_buffer in oracle )
synchronous_commit if off, asynchronous writes to the wal files are enabled ( loss of transactions may occur, but no data inconsistency )
wal_writer_delay the time frame the wal writer process writes blocks to the wal files ( 200ms by default )
checkpoint_segments the amount of checkpoint segments ( typically 16MB each ) available: comparable to oracle’s amount of redo logs
checkpoint_timeout controls the frequency of checkpoints ( 5 seconds by default )
checkpoint_warning controls how frequent checkpoints may occur until a warning to the log will be written
checkpoint_completion_target controls how fast checkpoints should complete ( 0.0 => fastest, 1.0 => slowest, which means the whole period between to checkpoints )
full_page_writes should be on to enable that the whole pages will be written to disk after the first change after a checkpoint.
wal_level controls how much information is written to the wal files: minimal ( crash recovery ), archive ( wal based recovery ), hot_standby ( read only standby )
archive_mode archiving of the wal files: on/off
archive_command any command used to archive the wal files
archive_timeout controls how often wal archived should be saved
hot_standby enables read only standby ( active dataguard in oracle terms )
max_wal_senders controls the amount of standby databases this master can serve
wal_sender_delay controls how often data gets replicated ( default is 200ms )

a reasonable configuration to start with ( standby databases are not in scope here ) could be:

echo "###### wal settings" >> $PARAMFILE
echo "fsync=on" >> $PARAMFILE
echo "wal_buffers=16MB" >> $PARAMFILE
echo "synchronous_commit=on" >> $PARAMFILE
echo "wal_writer_delay=200ms" >> $PARAMFILE
echo "checkpoint_segments=16" >> $PARAMFILE
echo "checkpoint_timeout=300s" >> $PARAMFILE
echo "checkpoint_warning=30s" >> $PARAMFILE
echo "checkpoint_completion_target=0.9" >> $PARAMFILE
echo "full_page_writes=on" >> $PARAMFILE
echo "wal_level=archive" >> $PARAMFILE
echo "archive_mode=on" >> $PARAMFILE
echo "archive_command='test ! -f /opt/postgres/arch/%f && cp %p /opt/postgres/arch/%f'" >> $PARAMFILE
echo "archive_timeout=10min" >> $PARAMFILE

as the vacuum and analyze processes are such important there are parameters to control this ( the most important here ):

parameter description
autovacuum enables the autovaccum process launcher
autovacuum_max_workers controls how many autovacuum processes will be started
autovacuum_naptime controls the minimum delay between vacuum processes ( defaults to 1 minute )

adding them to the server’s parameter file:

echo "###### autovaccum settings" >> $PARAMFILE
echo "autovacuum=on" >> $PARAMFILE
echo "autovacuum_max_workers=3" >> $PARAMFILE
echo "autovacuum_naptime=5min" >> $PARAMFILE

one more parameter to specify is for loading the pg_stat_statements module from the contrib directory:

echo "###### pg_stat_statements" >> $PARAMFILE
echo "shared_preload_libraries='pg_stat_statements'" >> $PARAMFILE

keep in mind that this is only a set to start with, especially if you do not know how the application will behave. there are a bunch of more parameters which give you much more control over various aspects of the database. check the documentation for the complete reference.