Archives For November 30, 1999

if you read the title you might think there is not much to say about. but if you come from the oracle world there is actually a huge difference in how postgresql handles updates in contrast to oracle.

let’s start with a simple table:

create table t1 ( a int );
insert into t1 (a) values (1);
insert into t1 (a) values (2);
insert into t1 (a) values (3);

for pointing out the differences I’ll use an extension called pageinspect. for making use of this extension you’ll have to create it:

create extension pageinspect;

this will create a couple of functions which assist in inspecting database pages:

\dx+ *inspect*
    Objects in extension "pageinspect"
            Object Description            
------------------------------------------
 function bt_metap(text)
 function bt_page_items(text,integer)
 function bt_page_stats(text,integer)
 function fsm_page_contents(bytea)
 function get_raw_page(text,integer)
 function get_raw_page(text,text,integer)
 function heap_page_items(bytea)
 function page_header(bytea)

coming back to the test table which contains three rows ( or tuples in postgresql wording ) right now. making use of the heap_page_items and get_raw_page functions of the pageinspect extension we may display some internals of the table’s page:

select lp, lp_len, t_xmin
     , t_xmax, lp_off, t_ctid 
  from heap_page_items(get_raw_page('t1', 0));
 lp | lp_len | t_xmin | t_xmax | lp_off | t_ctid 
----+--------+--------+--------+--------+--------
  1 |     28 |   9271 |      0 |   8160 | (0,1)
  2 |     28 |   9272 |      0 |   8128 | (0,2)
  3 |     28 |   9273 |      0 |   8096 | (0,3)

not going too much into the details right now this tells us that page zero of table t1 contains three rows. this is consistent with the inserts from above. what happens if we update one of the rows ?

update t1 set a=4 where a=1;

from an oracle perspective the answer will be clear: the row in question will be read, updated and written back once committed ( although it is the whole block or page that is read and written rather than the individual row ). oracle uses the undo data to create the old row version for others if they request the row before the commit.

let’s see what postgresql did:

SELECT lp, lp_len, t_xmin
     , t_xmax, lp_off, t_ctid 
  from heap_page_items(get_raw_page('t1', 0));
 lp | lp_len | t_xmin | t_xmax | lp_off | t_ctid 
----+--------+--------+--------+--------+--------
  1 |     28 |   9271 |   9274 |   8160 | (0,4)
  2 |     28 |   9272 |      0 |   8128 | (0,2)
  3 |     28 |   9273 |      0 |   8096 | (0,3)
  4 |     28 |   9274 |      0 |   8064 | (0,4)

ups, four rows/tuples now. what happened ? the first point you might have noticed is the t_xmax that got set for the first row. t_xmin is the transaction id that did the insert of the rows/tuples. t_xmax is the transaction that will get set if a tuple/row either gets updated or deleted. this actually means that transactions after t_xmax will not see this row/tuple. this is how postgresql implements multiversion concurrency control ( mvcc ). in other words: transactions below 9274 will see the first row, transaction greater 9274 will not see the first row anymore.

what actually happened is a delete of the first row and an insert of the fourth row and that’s the difference between oracle and postgresql. in postgresql an update is a delete followed by an insert. you might wonder what happens to the deleted row as it is still present in the page. that’s were vacuum kicks in:

vacuum verbose t1;
INFO:  vacuuming "public.t1"
INFO:  "t1": found 1 removable, 3 nonremovable row versions in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
(sysdba@[local]:1540)[ppas92] > SELECT lp, lp_len, t_xmin
                                        , t_xmax, lp_off, t_ctid 
                                     from heap_page_items(get_raw_page('t1', 0));
 lp | lp_len | t_xmin | t_xmax | lp_off | t_ctid 
----+--------+--------+--------+--------+--------
  1 |      0 |        |        |      4 | 
  2 |     28 |   9272 |      0 |   8160 | (0,2)
  3 |     28 |   9273 |      0 |   8128 | (0,3)
  4 |     28 |   9274 |      0 |   8096 | (0,4)

vacuum takes care of the expired rows and deletes the pointers. if you did not disable the autovacuum process the database takes care of this on its own and you do not need to worry about it.
one point to remember is that the space for the deleted row only will get released if you do a vacuum full:

vacuum full verbose t1;
INFO:  vacuuming "public.t1"
INFO:  "t1": found 0 removable, 3 nonremovable row versions in 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 0.00s/0.00u sec elapsed 0.02 sec.
VACUUM
(sysdba@[local]:1540)[ppas92] > SELECT lp, lp_len, t_xmin
                                     , t_xmax, lp_off, t_ctid 
                                  from heap_page_items(get_raw_page('t1', 0));
 lp | lp_len | t_xmin | t_xmax | lp_off | t_ctid 
----+--------+--------+--------+--------+--------
  1 |     28 |   9290 |      0 |   8160 | (0,1)
  2 |     28 |   9291 |      0 |   8128 | (0,2)
  3 |     28 |   9292 |      0 |   8096 | (0,3)

otherwise the space will just get reused when appropriate.

in postgresql there is the “create role” command to create both: a role and a user. in oracle there is “create user” and “create role”. but in the background it is almost the same:

SQL> create role my_role;

Role created.

SQL> create user u identified by u;

User created.

SQL> select name, type# from user$ where name in ('MY_ROLE','U');

NAME				    TYPE#
------------------------------ ----------
MY_ROLE 				0
U					1

it is the type, that matters.

as it is with oracle postgresql relies on various statistics to produce the explain plan for a query. the catalog table which stores the statistical information is pg_statistic.

for the purpose of this post let’s play with a simple table and see what gets stored in the pg_statistic table.

drop table if exists t1;
create table t1 ( a int );

I will always use the same query for producing the output from the pg_statistics table:

select pa.attname
     , ps.stawidth
     , ps.stadistinct
     , ps.stanullfrac
     , ps.stavalues1
     , ps.stavalues2
     , ps.stavalues3
     , ps.stavalues4
     , ps.stavalues5
  from pg_class pc
     , pg_statistic ps
     , pg_attribute pa
 where pc.relname = 't1'
   and pc.relowner = ( select nspowner 
                         from pg_namespace
                        where nspname = 'public' )
   and pc.oid = ps.starelid
   and pa.attnum = ps.staattnum
   and pa.attrelid = pc.oid;

If you run the query right now it will not return any rows as pg_statistics gets populated once you run analyze and at least one row is inserted into the table ( but not before ).

--exec statistics-query from above:
 attname | stawidth | stadistinct | stanullfrac | stavalues1 | stavalues2 | stavalues3 | stavalues4 | stavalues5 
---------+----------+-------------+-------------+------------+------------+------------+------------+------------
(0 rows)

So let’s create a row and see what’s happening:

insert into t1 (a) values (1);
analyze t1;
--exec statistics-query from above:
 attname | stawidth | stadistinct | stanullfrac | stavalues1 | stavalues2 | stavalues3 | stavalues4 | stavalues5 
---------+----------+-------------+-------------+------------+------------+------------+------------+------------
 a       |        4 |          -1 |           0 |      |      |      |      | 
(1 row)

what does this tell ?

  • the column “a” is reported to have a width of 4 which corresponds to the definition of the integer data-type.
  • distinct values is reported as “-1” which seems a little but surprising for the moment as it should be 1, shouldn’t it ? in fact this is equal to one as it tells that there is statistically one occurrence for the value
  • the number of null values is zero, of course

let’s create another row and have a look at the results:

insert into t1 (a) values (2);
analyze t1;
--exec statistics-query from above:
 attname | stawidth | stadistinct | stanullfrac | stavalues1 | stavalues2 | stavalues3 | stavalues4 | stavalues5 
---------+----------+-------------+-------------+------------+------------+------------+------------+------------
 a       |        4 |          -1 |           0 | {1,2}      |      |      |      | 

the only column that changed is the stavalues1 column. as soon as there is more than one row in the table this column gets populated. distinct values is still reported as “-1” as each row in the table has a different value. this will not change as long as you insert unique values and not too much of them ( more on his later ):

insert into t1 (a) values (generate_series(3,10));
analyze t1;
--exec statistics-query from above:
 attname | stawidth | stadistinct | stanullfrac |       stavalues1       | stavalues2 | stavalues3 | stavalues4 | stavalues5 
---------+----------+-------------+-------------+------------------------+------------+------------+------------+------------
 a       |        4 |          -1 |           0 | {1,2,3,4,5,6,7,8,9,10} |      |      |      | 
(1 row)

once you insert the same values again the stadistinct column gets updated to reflect the change in the data distribution:

insert into t1 (a) values (generate_series(1,10));
analyze t1;
--exec statistics-query from above:
 attname | stawidth | stadistinct | stanullfrac |       stavalues1       | stavalues2 | stavalues3 | stavalues4 | stavalues5 
---------+----------+-------------+-------------+------------------------+------------+------------+------------+------------
 a       |        4 |        -0.5 |           0 | {1,2,3,4,5,6,7,8,9,10} |      |      |      | 

now there is a value of “-0.5” which essentially tells that there are about 2 entries for each distinct value. we can do the same again and the stadistinct value decreases again to match data ( around three occurrences for each distinct value ):

insert into t1 (a) values (generate_series(1,10));
analyze t1;
--exec statistics-query from above:
 attname | stawidth | stadistinct | stanullfrac |       stavalues1       | stavalues2 | stavalues3 | stavalues4 | stavalues5 
---------+----------+-------------+-------------+------------------------+------------+------------+------------+------------
 a       |        4 |   -0.333333 |           0 | {1,2,3,4,5,6,7,8,9,10} |      |      |      | 
(1 row)

as it looks right now postgresql stores all the values of a column another time in the pg_statistic catalog table. as this would be bad practice let’s if this changes once we insert more data:

insert into t1 (a) values (generate_series(11,10000));
analyze t1;
--exec statistics-query from above:
 attname | stawidth | stadistinct | stanullfrac |       stavalues1       |        stavalues2    | stavalues3 | stavalues4 | stavalues5 
---------+----------+-------------+-------------+------------------------+----------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------+---
---------+------------+------------
 a       |        4 |   -0.998004 |           0 | {1,2,3,4,5,6,7,8,9,10} | {11,110,210,310,410,510,610,710,810,910,1009,1109,1209,1309,1409,1509
,1609,1709,1809,1908,2008,2108,2208,2308,2408,2508,2608,2708,2807,2907,3007,3107,3207,3307,3407,3507,3607,3706,3806,3906,4006,4106,4206,4306,440
6,4506,4605,4705,4805,4905,5005,5105,5205,5305,5405,5504,5604,5704,5804,5904,6004,6104,6204,6304,6403,6503,6603,6703,6803,6903,7003,7103,7203,73
02,7402,7502,7602,7702,7802,7902,8002,8102,8201,8301,8401,8501,8601,8701,8801,8901,9001,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000} |      |      | 
(1 row)

now we get the next stavalues column populated in steps of a hundred. the first stavalues columns remains unchanged in this example.

in oracle words the stavaluesN columns are similar to a histogram. it’s a statistical representation of the data distribution which is then used to generate execution plans. more to come …

just noticed another difference between postgres and oracle:

oracle:

SQL> create table t1 ( a number(10,2 ) );
Table created.
SQL> create view v1 as select a from t1;
View created.
SQL> alter table t1 modify ( a number(11,2));
Table altered.

postgresql:

[postgres] > create table t1 ( a numeric(10,2) );
CREATE TABLE
[postgres] > create view v1 as select a from t1;
CREATE VIEW
[postgres] > alter table t1 alter column a type numeric(11,2);
ERROR:  cannot alter type of a column used by a view or rule
DETAIL:  rule _RETURN on view v1 depends on column "a"

the same is true if you want to drop a table which has a view defined on it:
oracle:

SQL> drop table t1;
Table dropped.

postgres:

[postgres] > drop table t1;
ERROR:  cannot drop table t1 because other objects depend on it
DETAIL:  view v1 depends on table t1
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

when working with oracle there is exactly one numeric data type one may use for storing numbers ( and that’s number ). in postgresql there are ten. what happened when we migrated an oracle database to postgresql was, that a source table in oracle took much less space than the same table in postgresql. thanks to some people on the pgsql-performance mailing list one of the reasons for this was, that the wrong numeric data type was chosen for storing the integers in postgresql.

a simple test-case:

drop table datatype1;
drop table datatype2;
create table datatype1
( a numeric ( 8,0 )
, b numeric ( 8,0 )
);
insert into datatype1 ( a, b )
       values ( generate_series ( 1, 10000000 )
              , generate_series ( 1, 10000000 )
              );
create index idatatype1_1 on datatype1 ( a );
create index idatatype1_2 on datatype1 ( b );
create table datatype2
( a int
, b int
);
insert into datatype2 ( a, b )
       values ( generate_series ( 1, 10000000 )
              , generate_series ( 1, 10000000 )
             );
create index idatatype2_1 on datatype2 ( a );
create index idatatype2_2 on datatype2 ( b );
analyze verbose datatype1;
analyze verbose datatype2;
select pg_size_pretty ( pg_relation_size ( 'datatype1' ) );
 pg_size_pretty 
----------------
 422 MB
(1 row)
select pg_size_pretty ( pg_relation_size ( 'datatype2' ) );
 pg_size_pretty 
----------------
 346 MB
(1 row)

for these two little tables the difference is about 76mb. depending on the statements this is 76mb more that needs to be scanned and this can have great impacts on performance. surprisingly, at least for me, this is not true for the indexes:

select pg_size_pretty ( pg_relation_size ( 'idatatype1_1' ) );
 pg_size_pretty 
----------------
 214 MB
(1 row)
select pg_size_pretty ( pg_relation_size ( 'idatatype2_1' ) );
 pg_size_pretty 
----------------
 214 MB
(1 row)
select pg_size_pretty ( pg_relation_size ( 'idatatype2_1' ) );
 pg_size_pretty 
----------------
 214 MB
(1 row)
select pg_size_pretty ( pg_relation_size ( 'idatatype2_2' ) );
 pg_size_pretty 
----------------
 214 MB
(1 row)

so, it’s worth to keep an eye on which data types to use for numbers …

we are currently in the process of moving an oracle data warehouse to postgresql and had some fun on how oracle handles dates in contrast to postgresql. in oracle, if you subtract one date from another you’ll get an integer:

select to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy') diff from dual;
      DIFF
----------
         4

if you do the same in postgresql you’ll get an interval:

select to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy') diff;
  diff
--------
 4 days
(1 row)

the issue was, that a view used this in a where clause and did some calculations based on the return of the expression. obviously this failed when we created the view in postgresql. the trick was to use the to_char function on the interval:

select to_char ( to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy'), 'DD' ) diff;                                      
diff
------
 04

… we thought :) but the result can still not be used to do some calculations:

select to_char ( to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy'), 'DD' ) + 8;
ERROR:  operator does not exist: text + integer
LINE 1: ...m.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy'), 'DD' ) + 8;
                                                                   ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.

another conversion was necessary to make this work:

select cast ( to_char ( to_date('05.01.2012','dd.mm.yyyy') - to_date ( '01.01.2012','dd.mm.yyyy'), 'DD' ) as int ) + 8;
 ?column?
----------
       12
(1 row)

lessons learned ? oracle is doing implicit type conversions silently in the background. postgresql does not …

another example:

oracle:

SQL> create table t ( a number );
Table created.
SQL> insert into t values ( 20120101 );
1 row created.
SQL>  select to_date(a,'yyyymmdd') from t;
TO_DATE(A,'YYYYMMDD'
--------------------
01-JAN-2012 00:00:00

postgresql:

create table t ( a number );
CREATE TABLE
insert into t values ( 20120101 );
INSERT 0 1
select to_date(a,'yyyymmdd') from t;
ERROR:  function to_date(numeric, unknown) does not exist
LINE 1: select to_date(a,'yyyymmdd') from t;
               ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

while learning more about postgresql I came across the default case in postgresql ( it is lower case ). so, when querying the dictionary/catalog you’ll have to provide the lower case names to get any results:

postgres=# create table test1 ( a numeric );
CREATE TABLE
postgres=# select relname from pg_class where relname = 'TEST1';
 relname 
---------
(0 rows)
postgres=# select relname from pg_class where relname = 'test1';
 relname 
---------
 test1
(1 row)

in oracle you’ll need to use upper case by default:

SQL> create table test1 ( a number );
Table created.
SQL> select table_name from dba_tables where table_name = 'TEST1';
TABLE_NAME
------------------------------
TEST1
SQL> select table_name from dba_tables where table_name = 'test1';
no rows selected
SQL> 

if you want postgresql to respect the case when creating objects you’ll need to put double quotes around the names:

postgres=# create table "TEST2" ( a numeric );
CREATE TABLE
postgres=# select relname from pg_class where relname = 'TEST2';
 relname 
---------
 TEST2
(1 row)
postgres=# select relname from pg_class where relname = 'test22';
 relname 
---------
(0 rows)

same in oracle:

SQL> create table "test2" ( a number );
Table created.
SQL> select table_name from dba_tables where table_name = 'test2';
TABLE_NAME
------------------------------
test2
SQL> select table_name from dba_tables where table_name = 'TEST2';
no rows selected

knowing this, is it possible to create identical tables which just differ in the case of their name ? :

postgresql:

postgres=# create table test3 ( a numeric );
CREATE TABLE
postgres=# create table "Test3" ( a numeric );
CREATE TABLE
postgres=# create table "TesT3" ( a numeric );
CREATE TABLE
postgres=# select relname from pg_class where upper(relname) like 'TEST3%';
 relname 
---------
 Test3
 TesT3
 test3
(3 rows)

not an issue with postgresql. what about oracle ?

SQL> create table test3 ( a number );
Table created.
SQL> create table "Test3" ( a number );
Table created.
SQL> create table "TesT3" ( a number );
Table created.
SQL> select table_name from dba_tables where upper(table_name) like 'TEST3%';
TABLE_NAME
------------------------------
TesT3
Test3
TEST3

same behaviour. If someone had asked me if this is possible in oracle before, I would have said: no, definitely not. lessons learned ? :)

going further: what about constraint names ?
in postgresql:

postgres=# alter table test3 add constraint c1 check ( a is not null );
ALTER TABLE
postgres=# alter table test3 add constraint "C1" check ( a > 5 );
ALTER TABLE
postgres=# select conname,consrc from pg_constraint where upper(conname) = 'C1';
 conname |       consrc       
---------+--------------------
 c1      | (a IS NOT NULL)
 C1      | (a > (5)::numeric)

ok, this is consistent. what about oracle ? :

SQL> alter table test3 add constraint c1 check ( a is not null );
Table altered.
SQL> alter table test3 add constraint "C1" check ( a > 5 );
alter table test3 add constraint "C1" check ( a > 5 )
                                *
ERROR at line 1:
ORA-02264: name already used by an existing constraint

what about indexes ? postgresql:

postgres=# create index i1 on test3(a);
CREATE INDEX
postgres=# create index "i1" on test3(a);
ERROR:  relation "i1" already exists
postgres=# create index "I1" on test3(a);
CREATE INDEX
postgres=# select indexname,indexdef from pg_indexes where upper(indexname) = 'I1';
 indexname |                  indexdef                  
-----------+--------------------------------------------
 i1        | CREATE INDEX i1 ON test3 USING btree (a)
 I1        | CREATE INDEX "I1" ON test3 USING btree (a)
(2 rows)

oracle:

SQL> create index i1 on test3 ( a );
Index created.
SQL> create index "i1" on test3 ( a );
create index "i1" on test3 ( a )
                            *
ERROR at line 1:
ORA-01408: such column list already indexed

as oracle checks if an index is defined on the same column(s) this is not possible. slightly modified test:

SQL> alter table test3 add ( b number );
Table altered.
SQL> create index "I1" on test3 ( b );
create index "I1" on test3 ( b )
            *
ERROR at line 1:
ORA-00955: name is already used by an existing object

still not possible.

I did not check all the objects but it seems that oracle is not as consistent as postgresql in this case.

this is work in progress, but shall show the similarities and differences between postgresql and oracle in regards to implementing schemas and code. I will try to add more and more things in the future and update this post and the samples accordingly. documentation in the scripts is not very well at the moment but it should be enough to start.

for now the samples include:

  • tables: standard columns and arrays
  • constraints: primary keys, foreign keys, check constraints
  • triggers
  • sequences
  • indexes
  • views
  • loading blobs / clobs
  • plsql packages -> pgsql
  • materialzed views
  • partitioning
  • anonymous plsql / pgplsql blocks

after all these postgresql posts I thought it’s time to look at same really cool features postgresql offers but oracle lacks. of course oracle has plenty of features other databases don’t provide, but this is true the other way round, too.

psql – sqlplus

the more I use the psql utility ( postgresql’s equivalent to sqlplus ) the more I love it. this tiny little tool has so many wonderful features that it is hard to give a complete overview. so, here are my favorites:

one of the best features psql offers are the various shortcuts one can use to query the catalog ( data dictionary ), control the output, display help for the various commands and move data in and out of the database.

first example: to list the available views in oracle you have to query the data dictionary ( either dict or v$fixed_view ). in psql, it’s as easy like this:

postgres=# \dvS
                       List of relations
   Schema   |              Name               | Type |  Owner   
------------+---------------------------------+------+----------
 pg_catalog | pg_available_extension_versions | view | postgres
 pg_catalog | pg_available_extensions         | view | postgres
 pg_catalog | pg_cursors                      | view | postgres
...

if you want even more information ( size and description in this example ) a “+” can always be appended:

postgres=# \dvS+
                                   List of relations
   Schema   |              Name               | Type |  Owner   |  Size   | Description 
------------+---------------------------------+------+----------+---------+-------------
 pg_catalog | pg_available_extension_versions | view | postgres | 0 bytes | 
 pg_catalog | pg_available_extensions         | view | postgres | 0 bytes | 
 pg_catalog | pg_cursors                      | view | postgres | 0 bytes | 

you can even use wildcards if you know parts of an object name but are not sure about the exact name:

postgres=# \dvS *index* 
                   List of relations
   Schema   |          Name          | Type |  Owner   
------------+------------------------+------+----------
 pg_catalog | pg_indexes             | view | postgres
 pg_catalog | pg_stat_all_indexes    | view | postgres
 pg_catalog | pg_stat_sys_indexes    | view | postgres
 pg_catalog | pg_stat_user_indexes   | view | postgres
 pg_catalog | pg_statio_all_indexes  | view | postgres
 pg_catalog | pg_statio_sys_indexes  | view | postgres
 pg_catalog | pg_statio_user_indexes | view | postgres
(7 rows)

the same is true for tables (\dt), functions (\df), tablespaces (\db) and all the other objects available. no need to create scripts for querying frequent used information.

another big plus is the integrated help. let’s assume you are not sure about how to exactly create an index. perhaps you do not need that so often that you remember the syntax. no need to search the documentation:

postgres-# \h CREATE INDEX        
Command:     CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table [ USING method ]
    ( { column | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ WITH ( storage_parameter = value [, ... ] ) ]
    [ TABLESPACE tablespace ]
    [ WHERE predicate ]

that’s really cool.

editing functions directly ? not a problem with psql. let’s create a simple function ( this one is from the documentation ):

CREATE FUNCTION add(integer, integer) RETURNS integer
    AS 'select $1 + $2;'
    LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;

if you know want to directly edit this function, just do:

postgres=# \ef add(integer, integer)

… change it, save it, execute it and you’re done.

what about getting data out of postgresql ? maybe there’s a requirement to load data to a data warehouse which is a database from another vendor. flat files almost always provide a robust way for transporting data. and this is pretty easy and very well integrated with psql. the command in question is the “copy” command.
exporting a table to a file is not a big deal:

postgres=# copy myschema.customers to '/tmp/customers.log';                 
COPY 20000
postgres=# \! head /tmp/customers.log
1	VKUUXF	ITHOMQJNYX	4608499546 ABC Way	\N	QSDPAGD	SD	24101	US	1	ITHOMQJNYX@abc.com	4608499546	11979279217775911	2012/03	user1	password	55	100000	M

that’s it ( in the simplest way ). there are some more switches ( csv, headers, delimiter, etc. ) to fine tune your export, just use the integrated help to see what’s around:

\h copy

loading data back into postgresql ? same command, the other way around:

postgres=# create table myschema.customers2 ( like myschema.customers );  
CREATE TABLE
postgres=# copy myschema.customers2 from  '/tmp/customers.log';
COPY 20000
postgres=# select * from myschema.customers2 limit 1;
 customerid | firstname |  lastname  |      address1       | address2 |  city   | state |  zip  | country | region |        email        |  
 phone    | creditcardtype |    creditcard    | creditcardexpiration | username | password | age | income | gender 
------------+-----------+------------+---------------------+----------+---------+-------+-------+---------+--------+---------------------+--
----------+----------------+------------------+----------------------+----------+----------+-----+--------+--------
          1 | VKUUXF    | ITHOMQJNYX | 4608499546 ABC Way |          | QSDPAGD | SD    | 24101 | US      |      1 | ITHOMQJNYX@abc.com | 4
608499546 |              1 | 1979279217775911 | 2012/03              | user1    | password |  55 | 100000 | M
(1 row)

easy, isn’t it?

and by the way: tired of writing “select * from some_table” all the time ? use the “table” command to query a table:

postgres=# table myschema.customers;
 customerid | firstname |  lastname  |      address1       | address2 |  city   | state |  zip  |   country    | region |        email      
  |   phone    | creditcardtype |    creditcard    | creditcardexpiration | username  | password | age | income | gender 
------------+-----------+------------+---------------------+----------+---------+-------+-------+--------------+--------+-------------------
--+------------+----------------+------------------+----------------------+-----------+----------+-----+--------+--------

if you are used to bash or some other shells which provide similar functionality you for sure use the command history ( arrow up and down ). it’s integrated with psql, too, out of the box ( yes, I know you may use rlwrap with sqlplus, but you still have to do some extra work for getting this to work ). and as the various shells have their startup control file, there is one for psql, too, which usually is located in the home directory of the os user and is called “.psqlrc”. like the login.sql and glogin.sql files in oracle you can define your setup here. but you can do even more. psql provides the ability to define variables, e.g. :

\set waits 'SELECT pg_stat_activity.procpid, pg_stat_activity.current_query, pg_stat_activity.waiting, now() - pg_stat_activity.query_start  as "totaltime", pg_stat_activity.backend_start FROM pg_stat_activity WHERE pg_stat_activity.current_query !~ \'%IDLE%\'::text AND pg_stat_activity.waiting = true;;'

…defines a variable which contains a sql statement for displaying current waits in the database. once defined you can easily reference it:

postgres=# :waits
 procpid | current_query | waiting | totaltime | backend_start 
---------+---------------+---------+-----------+---------------
(0 rows)

put this in your “.psqlrc” file and you’ll have your variable available all the time. really cool.

indexing

in postgresql there is the concept of a partial index. that means you can create an index on a subset of a table’s data. this is not possible in oracle. let’s do an example:

assume we a have a table which contains an ever increasing number, an entry for each hour of the year and a true/false flag for each row ( postgresql allows columns to be defined as boolean, cool ):

create table t1 ( a integer, b timestamp with time zone, c boolean ); 

before creating a partial index let’s populate the table with some test-data. this also introduces the generate_series function which is a very easy and effective way to generate some data:

insert into t1 ( a, b, c )
       values ( generate_series ( 1, 8761 )
              , generate_series ( timestamptz ( to_date('01.01.2012','DD.MM.YYYY') ) 
                                , timestamptz ( to_date('31.12.2012','DD.MM.YYYY') )
                                , interval '1h' 
                                )
              , 'T'
              );
update t1 set c = 'F' where mod(a,111) = 0;

now, assume there is a report which runs at the end of every month which is only interested in data which has the false flag set on column c ( maybe to to get all customers who did not pay their receipt :) ). you could create a normal index on column c, but you could also create a partial index for this:

create index i1 on t1 ( c ) where not c;

this will:
a) greatly reduce the size of the index
b) only index the data which fulfills the expression
c) exactly provides the data the report ask for

let’s see what explain tells about the statement the report uses:

indx=# analyze verbose t1;
INFO:  analyzing "public.t1"
INFO:  "t1": scanned 57 of 57 pages, containing 8761 live rows and 0 dead rows; 8761 rows in sample, 8761 estimated total rows
ANALYZE
indx=# explain analyze select * from t1 where not c;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Index Scan using i1 on t1  (cost=0.00..12.85 rows=78 width=13) (actual time=0.016..0.056 rows=78 loops=1)
   Index Cond: (c = false)
 Total runtime: 0.098 ms
(3 rows)

exactly what I wanted. so, if you know the statements in your database and you know your data ( and you probably should :) ) partial indexes may provide a great opportunity.

granting and revoking

for sure you had the situation where you needed to grant select on all tables in a schema to another user. in oracle one would create a role, grant select for every single table in the source schema to that role and then grant the role to the target user ( if you do not want to grant to public, which is a bad idea anyway ).

in postgresql this is much easier. let’s setup a simple test case:

postgres=# create role usr1 login password 'usr1';
CREATE ROLE
postgres=# create role usr2 login password 'usr2';
CREATE ROLE
postgres=# create database beer owner=usr1;
CREATE DATABASE

in oracle a schema is almost the same thing as a user. in postgresql you’ll explicitly have to create a schema. otherwise the objects will get created in the public schema:

postgres=# \c beer usr1
Password for user usr1: 
You are now connected to database "beer" as user "usr1".
beer=> create schema myschema;
CREATE SCHEMA
beer=> \dn
   List of schemas
   Name   |  Owner   
----------+----------
 myschema | usr1
 public   | postgres
(2 rows)

as the schema is available now, tables can be created in the new schema:

beer=> create table myschema.t1 ( a int );
CREATE TABLE
beer=> create table myschema.t2 ( a int );
CREATE TABLE
beer=> create table myschema.t3 ( a int );
CREATE TABLE

granting select on all the tables in the schema is as easy as:

beer=> grant usage on schema myschema to usr2;
GRANT
beer=> grant select on all tables in schema myschema to usr2;
GRANT

without the “usage” grant the user will not be able to do anything in the schema. so be sure, to grant it before granting any other privileges.

to verify it:

beer=> \c beer usr2
Password for user usr2: 
You are now connected to database "beer" as user "usr2".
beer=> select * from myschema.t1;
 a 
---
(0 rows)
beer=> select * from myschema.t2;
 a 
---
(0 rows)
beer=> select * from myschema.t3;
 a 
---
(0 rows)

you can even grant select on specific columns ( which is not possible in oracle, too ):

beer=>\c postgres postgres
postgres=# create user usr3 login password 'usr3';
CREATE ROLE
postgres=# \c beer usr1
Password for user usr1: 
You are now connected to database "beer" as user "usr1".
beer=> grant usage on schema myschema to usr3;
beer=> grant select (a) on table myschema.t1 to usr3;
GRANT
beer=> \c beer usr3
Password for user usr3: 
You are now connected to database "beer" as user "usr3".
beer=> select a from myschema.t1;
 a 
---
(0 rows)

this can be very handy if you want to hide some columns or give select to just a few and do not want to create views on top of the table.

creating objects

creating objects ? you might think there is nothing special here. but wait :)
in oracle, if you’d like to create a table which has exactly the same columns as another table you’d do something like this:

create table new_table as select * from source_table where 1 = 2;

in postgresql you you can do the same:

d1=# create table t1 ( a int, b int );
CREATE TABLE
d1=#  create table t2 as select * from t1 where 1=2;
SELECT 0
d1=# \d t2
      Table "public.t2"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 

but postgresql goes some steps further. for creating a similar table there is the “like” keyword. in the easiest way you would do:

d1=# create table t3 ( like t1 );
CREATE TABLE
d1=# \d t3
      Table "public.t3"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 

nothing special so far. just an easier to understand and shorter syntax ( in my opinion ). but wait, check this out:

d1=# alter table t1 add constraint uk1 unique (a);
NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit index "uk1" for table "t1"
ALTER TABLE
d1=# \d t1
      Table "public.t1"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 
Indexes:
    "uk1" UNIQUE CONSTRAINT, btree (a)
d1=# create table t4 ( like t1 INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS );
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "t4_a_key" for table "t4"
CREATE TABLE
d1=# \d t4
      Table "public.t4"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 
Indexes:
    "t4_a_key" UNIQUE CONSTRAINT, btree (a)

that’s cool. get an exact copy of the table’s definition with one command. you think there can’t be more goodies ? what about this? :

d1=# create table t5 ( a int, b int );
CREATE TABLE
d1=# create table t6 ( c date, d date );
CREATE TABLE
d1=# create table t7 ( a int, b int, c date, d date ) inherits ( t5, t6 );
NOTICE:  merging column "a" with inherited definition
NOTICE:  merging column "b" with inherited definition
NOTICE:  merging column "c" with inherited definition
NOTICE:  merging column "d" with inherited definition
CREATE TABLE
d1=# \d t7
      Table "public.t7"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 
 c      | date    | 
 d      | date    | 
Inherits: t5,
          t6

what the hell is this ? some sort of object oriented mechanism ? let’s add some data and check the results:

d1=# insert into t7 values ( 1,1,current_date,current_date);
INSERT 0 1
d1=# select * from t5;
 a | b 
---+---
 1 | 1
(1 row)
d1=# select * from t6;
     c      |     d      
------------+------------
 2012-08-22 | 2012-08-22
(1 row)
d1=# select * from t7;
 a | b |     c      |     d      
---+---+------------+------------
 1 | 1 | 2012-08-22 | 2012-08-22
(1 row)

as you can see the tables are now dependent on each other. if data is added into table t7, the data is automatically present in the underlying t5 and t6 tables. but this does not work the other way around:

d1=# insert into t5 values (2,2);
INSERT 0 1
d1=# insert into t6 values ( current_date,current_date);
INSERT 0 1
d1=# select * from t7;
 a | b |     c      |     d      
---+---+------------+------------
 1 | 1 | 2012-08-22 | 2012-08-22
(1 row)

what happens if you change the column definitions ?

d1=# alter table t5 alter column a type bigint;
ALTER TABLE
d1=# \d t5
      Table "public.t5"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | bigint  | 
 b      | integer | 
Number of child tables: 1 (Use \d+ to list them.)
d1=# \d t7
      Table "public.t7"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | bigint  | 
 b      | integer | 
 c      | date    | 
 d      | date    | 
Inherits: t5,
          t6

automatically gets propagated. really nice … I don’t have a use case for this currently, but maybe this can be useful to reduce a table’s size into smaller tables depending on the columns. would like to hear if someone has used this feature and for what purpose. feel free to post comments.

time for the last one ( for now :) ). I am pretty sure plenty of databases use sequences to generate primary keys. in postgresql you can set the default for a table’s column to get populated from a specific sequence ( not possible in oracle ):

d1=# create sequence s1;
CREATE SEQUENCE
d1=# create table t8 ( a int primary key default nextval('s1'), b int );
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t8_pkey" for table "t8"
CREATE TABLE
d1=# insert into t8 (b) values (1);
INSERT 0 1
d1=# select * from t8;
 a | b 
---+---
 1 | 1
(1 row)

for sure there are lots and lots of more cool features, but for now this shall be enough …

UPDATE: Check this post for a use case for table inheritance in the sample scripts.

just a short notice, that the postgresql people released a security update for the current version ( 9.1 ) and some of the older versions. see the announcement for more information.