2012 August « Daniel Westermann's Blog

Archives For July 31, 2012

passing a non default interval to a procedure

August 30, 2012 — Leave a comment

ever wanted to pass a value of type interval to a procedure ? not a big a deal you might think:

create or replace procedure p1 ( a interval day to second ) is
begin
  null;
end;
/
exec p1 ( a => interval '10 00:00:00' day to second );

works as expected. but what if you want to pass an interval with a precision of more than two ?

exec p1 ( a => interval '101 00:00:00' day to second );
BEGIN p1 ( a => interval '101 00:00:00' day to second ); END;
*
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small
ORA-06512: at line 1

should work, shouldn’t it ?
let’s try to specify the precision in the procedure’s declaration:

create or replace procedure p1 ( a interval day(3) to second ) is
begin
  null;
end;
  5  /
Warning: Procedure created with compilation errors.
SQL> show errors;
Errors for PROCEDURE P1:
LINE/COL ERROR
-------- -----------------------------------------------------------------
1/30	 PLS-00103: Encountered the symbol "(" when expecting one of the
	 following:
	 to

ok, it is not possible to specify this in the declaration. can we pass the precision to the procedure along with the parameter ?

exec p1 ( a => interval '101 00:00:00' day(3) to second );
BEGIN p1 ( a => interval '101 00:00:00' day(3) to second ); END;
                                           *
ERROR at line 1:
ORA-06550: line 1, column 44:
PLS-00103: Encountered the symbol "(" when expecting one of the following:
) , * & = - +  at in is mod remainder not rem
  or != or ~= >= <=  and or like like2
like4 likec between to || multiset member submultiset

syntax error, again. what’s wrong ? the documentation tells that the default precision is 2, but can be of any value between 0 and 9. so this must be possible somehow.

the solution is either to move the procedure to a package and to declare a subtype:

create or replace package ptest
is
   subtype my_interval is interval day(3) to second;
   procedure p1 ( v_interval IN my_interval );
end;
/
create or replace package body ptest
is
   procedure p1 ( v_interval IN my_interval ) 
   is
   begin
     null;
   end p1;
end;
/
SQL> exec ptest.p1 ( interval '230 23:0:0.1' day to second );
PL/SQL procedure successfully completed.
SQL>  exec ptest.p1 ( interval '23 23:0:0.1' day to second );
PL/SQL procedure successfully completed.

… or to reference the column of a table:

SQL> create table t1 ( a interval day(3) to second );
Table created.
SQL> create or replace procedure p1 ( a t1.a%type ) is begin null; end;
  2  /
Procedure created.
SQL> exec p1 ( a => interval '101 00:00:00' day to second);
PL/SQL procedure successfully completed.
SQL> exec p1 ( a => interval '10 00:00:00' day to second);
PL/SQL procedure successfully completed.
SQL>

In quick&dirty oracle

why not take a look at postgresql ? (10): hey oracle, that’s really cool

August 22, 2012 — Leave a comment

after all these postgresql posts I thought it’s time to look at same really cool features postgresql offers but oracle lacks. of course oracle has plenty of features other databases don’t provide, but this is true the other way round, too.

psql – sqlplus

the more I use the psql utility ( postgresql’s equivalent to sqlplus ) the more I love it. this tiny little tool has so many wonderful features that it is hard to give a complete overview. so, here are my favorites:

one of the best features psql offers are the various shortcuts one can use to query the catalog ( data dictionary ), control the output, display help for the various commands and move data in and out of the database.

first example: to list the available views in oracle you have to query the data dictionary ( either dict or v$fixed_view ). in psql, it’s as easy like this:

postgres=# \dvS
                       List of relations
   Schema   |              Name               | Type |  Owner   
------------+---------------------------------+------+----------
 pg_catalog | pg_available_extension_versions | view | postgres
 pg_catalog | pg_available_extensions         | view | postgres
 pg_catalog | pg_cursors                      | view | postgres
...

if you want even more information ( size and description in this example ) a “+” can always be appended:

postgres=# \dvS+
                                   List of relations
   Schema   |              Name               | Type |  Owner   |  Size   | Description 
------------+---------------------------------+------+----------+---------+-------------
 pg_catalog | pg_available_extension_versions | view | postgres | 0 bytes | 
 pg_catalog | pg_available_extensions         | view | postgres | 0 bytes | 
 pg_catalog | pg_cursors                      | view | postgres | 0 bytes |

you can even use wildcards if you know parts of an object name but are not sure about the exact name:

postgres=# \dvS *index* 
                   List of relations
   Schema   |          Name          | Type |  Owner   
------------+------------------------+------+----------
 pg_catalog | pg_indexes             | view | postgres
 pg_catalog | pg_stat_all_indexes    | view | postgres
 pg_catalog | pg_stat_sys_indexes    | view | postgres
 pg_catalog | pg_stat_user_indexes   | view | postgres
 pg_catalog | pg_statio_all_indexes  | view | postgres
 pg_catalog | pg_statio_sys_indexes  | view | postgres
 pg_catalog | pg_statio_user_indexes | view | postgres
(7 rows)

the same is true for tables (\dt), functions (\df), tablespaces (\db) and all the other objects available. no need to create scripts for querying frequent used information.

another big plus is the integrated help. let’s assume you are not sure about how to exactly create an index. perhaps you do not need that so often that you remember the syntax. no need to search the documentation:

postgres-# \h CREATE INDEX        
Command:     CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table [ USING method ]
    ( { column | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ WITH ( storage_parameter = value [, ... ] ) ]
    [ TABLESPACE tablespace ]
    [ WHERE predicate ]

that’s really cool.

editing functions directly ? not a problem with psql. let’s create a simple function ( this one is from the documentation ):

CREATE FUNCTION add(integer, integer) RETURNS integer
    AS 'select $1 + $2;'
    LANGUAGE SQL
    IMMUTABLE
    RETURNS NULL ON NULL INPUT;

if you know want to directly edit this function, just do:

postgres=# \ef add(integer, integer)

… change it, save it, execute it and you’re done.

what about getting data out of postgresql ? maybe there’s a requirement to load data to a data warehouse which is a database from another vendor. flat files almost always provide a robust way for transporting data. and this is pretty easy and very well integrated with psql. the command in question is the “copy” command.
exporting a table to a file is not a big deal:

postgres=# copy myschema.customers to '/tmp/customers.log';                 
COPY 20000
postgres=# \! head /tmp/customers.log
1	VKUUXF	ITHOMQJNYX	4608499546 ABC Way	\N	QSDPAGD	SD	24101	US	1	ITHOMQJNYX@abc.com	4608499546	11979279217775911	2012/03	user1	password	55	100000	M

that’s it ( in the simplest way ). there are some more switches ( csv, headers, delimiter, etc. ) to fine tune your export, just use the integrated help to see what’s around:

\h copy

loading data back into postgresql ? same command, the other way around:

postgres=# create table myschema.customers2 ( like myschema.customers );  
CREATE TABLE
postgres=# copy myschema.customers2 from  '/tmp/customers.log';
COPY 20000
postgres=# select * from myschema.customers2 limit 1;
 customerid | firstname |  lastname  |      address1       | address2 |  city   | state |  zip  | country | region |        email        |  
 phone    | creditcardtype |    creditcard    | creditcardexpiration | username | password | age | income | gender 
------------+-----------+------------+---------------------+----------+---------+-------+-------+---------+--------+---------------------+--
----------+----------------+------------------+----------------------+----------+----------+-----+--------+--------
          1 | VKUUXF    | ITHOMQJNYX | 4608499546 ABC Way |          | QSDPAGD | SD    | 24101 | US      |      1 | ITHOMQJNYX@abc.com | 4
608499546 |              1 | 1979279217775911 | 2012/03              | user1    | password |  55 | 100000 | M
(1 row)

easy, isn’t it?

and by the way: tired of writing “select * from some_table” all the time ? use the “table” command to query a table:

postgres=# table myschema.customers;
 customerid | firstname |  lastname  |      address1       | address2 |  city   | state |  zip  |   country    | region |        email      
  |   phone    | creditcardtype |    creditcard    | creditcardexpiration | username  | password | age | income | gender 
------------+-----------+------------+---------------------+----------+---------+-------+-------+--------------+--------+-------------------
--+------------+----------------+------------------+----------------------+-----------+----------+-----+--------+--------

if you are used to bash or some other shells which provide similar functionality you for sure use the command history ( arrow up and down ). it’s integrated with psql, too, out of the box ( yes, I know you may use rlwrap with sqlplus, but you still have to do some extra work for getting this to work ). and as the various shells have their startup control file, there is one for psql, too, which usually is located in the home directory of the os user and is called “.psqlrc”. like the login.sql and glogin.sql files in oracle you can define your setup here. but you can do even more. psql provides the ability to define variables, e.g. :

\set waits 'SELECT pg_stat_activity.procpid, pg_stat_activity.current_query, pg_stat_activity.waiting, now() - pg_stat_activity.query_start  as "totaltime", pg_stat_activity.backend_start FROM pg_stat_activity WHERE pg_stat_activity.current_query !~ \'%IDLE%\'::text AND pg_stat_activity.waiting = true;;'

…defines a variable which contains a sql statement for displaying current waits in the database. once defined you can easily reference it:

postgres=# :waits
 procpid | current_query | waiting | totaltime | backend_start 
---------+---------------+---------+-----------+---------------
(0 rows)

put this in your “.psqlrc” file and you’ll have your variable available all the time. really cool.

indexing

in postgresql there is the concept of a partial index. that means you can create an index on a subset of a table’s data. this is not possible in oracle. let’s do an example:

assume we a have a table which contains an ever increasing number, an entry for each hour of the year and a true/false flag for each row ( postgresql allows columns to be defined as boolean, cool ):

create table t1 ( a integer, b timestamp with time zone, c boolean );

before creating a partial index let’s populate the table with some test-data. this also introduces the generate_series function which is a very easy and effective way to generate some data:

insert into t1 ( a, b, c )
       values ( generate_series ( 1, 8761 )
              , generate_series ( timestamptz ( to_date('01.01.2012','DD.MM.YYYY') ) 
                                , timestamptz ( to_date('31.12.2012','DD.MM.YYYY') )
                                , interval '1h' 
                                )
              , 'T'
              );
update t1 set c = 'F' where mod(a,111) = 0;

now, assume there is a report which runs at the end of every month which is only interested in data which has the false flag set on column c ( maybe to to get all customers who did not pay their receipt :) ). you could create a normal index on column c, but you could also create a partial index for this:

create index i1 on t1 ( c ) where not c;

this will:
a) greatly reduce the size of the index
b) only index the data which fulfills the expression
c) exactly provides the data the report ask for

let’s see what explain tells about the statement the report uses:

indx=# analyze verbose t1;
INFO:  analyzing "public.t1"
INFO:  "t1": scanned 57 of 57 pages, containing 8761 live rows and 0 dead rows; 8761 rows in sample, 8761 estimated total rows
ANALYZE
indx=# explain analyze select * from t1 where not c;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Index Scan using i1 on t1  (cost=0.00..12.85 rows=78 width=13) (actual time=0.016..0.056 rows=78 loops=1)
   Index Cond: (c = false)
 Total runtime: 0.098 ms
(3 rows)

exactly what I wanted. so, if you know the statements in your database and you know your data ( and you probably should :) ) partial indexes may provide a great opportunity.

granting and revoking

for sure you had the situation where you needed to grant select on all tables in a schema to another user. in oracle one would create a role, grant select for every single table in the source schema to that role and then grant the role to the target user ( if you do not want to grant to public, which is a bad idea anyway ).

in postgresql this is much easier. let’s setup a simple test case:

postgres=# create role usr1 login password 'usr1';
CREATE ROLE
postgres=# create role usr2 login password 'usr2';
CREATE ROLE
postgres=# create database beer owner=usr1;
CREATE DATABASE

in oracle a schema is almost the same thing as a user. in postgresql you’ll explicitly have to create a schema. otherwise the objects will get created in the public schema:

postgres=# \c beer usr1
Password for user usr1: 
You are now connected to database "beer" as user "usr1".
beer=> create schema myschema;
CREATE SCHEMA
beer=> \dn
   List of schemas
   Name   |  Owner   
----------+----------
 myschema | usr1
 public   | postgres
(2 rows)

as the schema is available now, tables can be created in the new schema:

beer=> create table myschema.t1 ( a int );
CREATE TABLE
beer=> create table myschema.t2 ( a int );
CREATE TABLE
beer=> create table myschema.t3 ( a int );
CREATE TABLE

granting select on all the tables in the schema is as easy as:

beer=> grant usage on schema myschema to usr2;
GRANT
beer=> grant select on all tables in schema myschema to usr2;
GRANT

without the “usage” grant the user will not be able to do anything in the schema. so be sure, to grant it before granting any other privileges.

to verify it:

beer=> \c beer usr2
Password for user usr2: 
You are now connected to database "beer" as user "usr2".
beer=> select * from myschema.t1;
 a 
---
(0 rows)
beer=> select * from myschema.t2;
 a 
---
(0 rows)
beer=> select * from myschema.t3;
 a 
---
(0 rows)

you can even grant select on specific columns ( which is not possible in oracle, too ):

beer=>\c postgres postgres
postgres=# create user usr3 login password 'usr3';
CREATE ROLE
postgres=# \c beer usr1
Password for user usr1: 
You are now connected to database "beer" as user "usr1".
beer=> grant usage on schema myschema to usr3;
beer=> grant select (a) on table myschema.t1 to usr3;
GRANT
beer=> \c beer usr3
Password for user usr3: 
You are now connected to database "beer" as user "usr3".
beer=> select a from myschema.t1;
 a 
---
(0 rows)

this can be very handy if you want to hide some columns or give select to just a few and do not want to create views on top of the table.

creating objects

creating objects ? you might think there is nothing special here. but wait :)
in oracle, if you’d like to create a table which has exactly the same columns as another table you’d do something like this:

create table new_table as select * from source_table where 1 = 2;

in postgresql you you can do the same:

d1=# create table t1 ( a int, b int );
CREATE TABLE
d1=#  create table t2 as select * from t1 where 1=2;
SELECT 0
d1=# \d t2
      Table "public.t2"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer |

but postgresql goes some steps further. for creating a similar table there is the “like” keyword. in the easiest way you would do:

d1=# create table t3 ( like t1 );
CREATE TABLE
d1=# \d t3
      Table "public.t3"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer |

nothing special so far. just an easier to understand and shorter syntax ( in my opinion ). but wait, check this out:

d1=# alter table t1 add constraint uk1 unique (a);
NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit index "uk1" for table "t1"
ALTER TABLE
d1=# \d t1
      Table "public.t1"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 
Indexes:
    "uk1" UNIQUE CONSTRAINT, btree (a)
d1=# create table t4 ( like t1 INCLUDING CONSTRAINTS INCLUDING INDEXES INCLUDING STORAGE INCLUDING COMMENTS );
NOTICE:  CREATE TABLE / UNIQUE will create implicit index "t4_a_key" for table "t4"
CREATE TABLE
d1=# \d t4
      Table "public.t4"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 
Indexes:
    "t4_a_key" UNIQUE CONSTRAINT, btree (a)

that’s cool. get an exact copy of the table’s definition with one command. you think there can’t be more goodies ? what about this? :

d1=# create table t5 ( a int, b int );
CREATE TABLE
d1=# create table t6 ( c date, d date );
CREATE TABLE
d1=# create table t7 ( a int, b int, c date, d date ) inherits ( t5, t6 );
NOTICE:  merging column "a" with inherited definition
NOTICE:  merging column "b" with inherited definition
NOTICE:  merging column "c" with inherited definition
NOTICE:  merging column "d" with inherited definition
CREATE TABLE
d1=# \d t7
      Table "public.t7"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | integer | 
 c      | date    | 
 d      | date    | 
Inherits: t5,
          t6

what the hell is this ? some sort of object oriented mechanism ? let’s add some data and check the results:

d1=# insert into t7 values ( 1,1,current_date,current_date);
INSERT 0 1
d1=# select * from t5;
 a | b 
---+---
 1 | 1
(1 row)
d1=# select * from t6;
     c      |     d      
------------+------------
 2012-08-22 | 2012-08-22
(1 row)
d1=# select * from t7;
 a | b |     c      |     d      
---+---+------------+------------
 1 | 1 | 2012-08-22 | 2012-08-22
(1 row)

as you can see the tables are now dependent on each other. if data is added into table t7, the data is automatically present in the underlying t5 and t6 tables. but this does not work the other way around:

d1=# insert into t5 values (2,2);
INSERT 0 1
d1=# insert into t6 values ( current_date,current_date);
INSERT 0 1
d1=# select * from t7;
 a | b |     c      |     d      
---+---+------------+------------
 1 | 1 | 2012-08-22 | 2012-08-22
(1 row)

what happens if you change the column definitions ?

d1=# alter table t5 alter column a type bigint;
ALTER TABLE
d1=# \d t5
      Table "public.t5"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | bigint  | 
 b      | integer | 
Number of child tables: 1 (Use \d+ to list them.)
d1=# \d t7
      Table "public.t7"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | bigint  | 
 b      | integer | 
 c      | date    | 
 d      | date    | 
Inherits: t5,
          t6

automatically gets propagated. really nice … I don’t have a use case for this currently, but maybe this can be useful to reduce a table’s size into smaller tables depending on the columns. would like to hear if someone has used this feature and for what purpose. feel free to post comments.

time for the last one ( for now :) ). I am pretty sure plenty of databases use sequences to generate primary keys. in postgresql you can set the default for a table’s column to get populated from a specific sequence ( not possible in oracle ):

d1=# create sequence s1;
CREATE SEQUENCE
d1=# create table t8 ( a int primary key default nextval('s1'), b int );
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "t8_pkey" for table "t8"
CREATE TABLE
d1=# insert into t8 (b) values (1);
INSERT 0 1
d1=# select * from t8;
 a | b 
---+---
 1 | 1
(1 row)

for sure there are lots and lots of more cool features, but for now this shall be enough …

UPDATE: Check this post for a use case for table inheritance in the sample scripts.

In main postgres

security update for postgresql released

August 19, 2012 — Leave a comment

just a short notice, that the postgresql people released a security update for the current version ( 9.1 ) and some of the older versions. see the announcement for more information.

In smalltalk postgres

encrypted column export to csv ?

August 6, 2012 — Leave a comment

recently a customer had the request to export a column to a flat file. nothing special, but the request was to encrypt the column as the data needs to be delivered to a third site. as you might know oracle provides the dbms_obfuscation_toolkit package which might be used for requests like this. so, here is an example ( as wordpress does not allow to upload scripts, sorry for the long pasting :) ):

/**
Export a column for a given table and schema in encrypted format to csv
PARAMETERS: OWNER
            TABLE
            COLUMN
            ENRYPTION-KEY
**/
SET LINES 164 VERIFY OFF PAGES 999 SERVEROUT ON FEEDBACK OFF
VARIABLE owner VARCHAR2(30)
VARIABLE table_name VARCHAR2(30)
VARIABLE column_name VARCHAR2(30)
VARIABLE encryption_key VARCHAR2(100)
VARIABLE continue VARCHAR2(1);
EXEC dbms_output.put_line ( chr(13) );
EXEC dbms_output.put_line ( '------------------------------------------------------' );
EXEC dbms_output.put_line ( '-- INFO: Starting encrypted column export  ');
EXEC dbms_output.put_line ( chr(13) );
/**
do some sanity checks
**/
DECLARE
  lv_owner all_users.username%TYPE;
  lv_directory all_directories.directory_path%TYPE;
  FUNCTION schema_exists ( pv_schema IN all_users.username%TYPE )
                         RETURN BOOLEAN
  IS
    CURSOR cur_schema
    IS SELECT 'ok'
         FROM all_users
        WHERE username = pv_schema
    ;
    lv_cursor_result VARCHAR2(2);
  BEGIN
    OPEN cur_schema;
      FETCH cur_schema INTO lv_cursor_result;
    CLOSE cur_schema;
    RETURN NVL ( lv_cursor_result, 'xx' ) = 'ok';
  END schema_exists;
  FUNCTION table_exists ( pv_schema IN all_users.username%TYPE
                        , pv_table IN all_tables.table_name%TYPE
                        ) RETURN BOOLEAN
  IS
    CURSOR cur_table
    IS SELECT 'ok'
         FROM all_tables
        WHERE owner = pv_schema
          AND table_name = pv_table
    ;
    lv_cursor_result VARCHAR2(2);
  BEGIN
    OPEN cur_table;
      FETCH cur_table INTO lv_cursor_result;
    CLOSE cur_table;
    RETURN NVL ( lv_cursor_result, 'xx' ) = 'ok';
  END table_exists;
  FUNCTION column_exists ( pv_schema IN all_users.username%TYPE
                         , pv_table IN all_tables.table_name%TYPE
                         , pv_column IN all_tab_columns.column_name%TYPE
                         ) RETURN BOOLEAN
  IS
    CURSOR cur_column
    IS SELECT 'ok'
         FROM all_tab_columns
        WHERE owner = pv_schema
          AND table_name = pv_table
          AND column_name = pv_column
    ;
    lv_cursor_result VARCHAR2(2);
  BEGIN
    OPEN cur_column;
      FETCH cur_column INTO lv_cursor_result;
    CLOSE cur_column;
    RETURN NVL ( lv_cursor_result, 'xx' ) = 'ok';
  END column_exists;
  FUNCTION exp_directory_exists RETURN BOOLEAN
  IS
    CURSOR cur_directory
    IS SELECT directory_path
         FROM all_directories
        WHERE directory_name = 'DATA_PUMP_DIR'
    ;
  BEGIN
    OPEN cur_directory;
      FETCH cur_directory INTO lv_directory;
    CLOSE cur_directory;
    RETURN lv_directory IS NOT NULL;
  END exp_directory_exists;
BEGIN
   :owner := UPPER('&1');
   :table_name := UPPER('&2'); 
   :column_name := UPPER('&3');
   :encryption_key := '&4';
   IF :encryption_key IS NULL
      OR
      -- encryption key must be at least 80 characters
      length ( :encryption_key ) < 80
   THEN
     raise_application_error ( -20001
                             , '--- CHECK INPUT: The encryption key "'||:encryption_key||'" is too short or empty ! ');
     :continue := 'N';
   END IF;
   IF NOT ( schema_exists ( pv_schema => :owner ) )
   THEN
     raise_application_error ( -20001
                             , '--- CHECK INPUT: The schema '||:owner||' does not exist in this database ! ');
     :continue := 'N';
   END IF;
   IF NOT ( table_exists ( pv_schema => :owner
                         , pv_table => :table_name
                         ) )
   THEN
     raise_application_error ( -20001
                             , '--- CHECK INPUT: The table '||:table_name||' does not exist for the schema '||:owner||' ! ');
     :continue := 'N';
   END IF;
   IF NOT ( column_exists ( pv_schema => :owner
                          , pv_table => :table_name
                          , pv_column => :column_name
                          ) )
   THEN
     raise_application_error ( -20001
                             , '--- CHECK INPUT: The column '||:column_name||' does not exist for the table '||:table_name||' in schema '||:owner||'! ');
     :continue := 'N';
   END IF;
   IF NOT ( exp_directory_exists )
   THEN
     raise_application_error ( -20001
                             , '--- CHECK DIRECTORY: Can not find directory DATA_PUMP_DIR ! ');
     :continue := 'N';
   ELSE
     dbms_output.put_line (' --- INFO: File will be located here : '||lv_directory );
   END IF;
   :continue := 'Y';
END;
/
DECLARE
  lv_owner all_users.username%TYPE := UPPER(:owner);
  lv_table all_tables.table_name%TYPE := UPPER(:table_name);
  lv_column all_tab_columns.column_name%TYPE := UPPER(:column_name);
  lt_file_type utl_file.file_type;
  lv_statement VARCHAR2(4000);
  ln_cursor NUMBER;
  ln_amount_of_rows INTEGER := 0;
  lv_value VARCHAR2(4000);
  lv_seed VARCHAR2(4000) := :encryption_key;
BEGIN
  IF ( :continue = 'Y' ) 
  THEN
    dbms_output.put_line (' --- INFO: selected schema is : '||lv_owner );
    dbms_output.put_line (' --- INFO: selected table is  : '||lv_table );
    dbms_output.put_line (' --- INFO: selected column is : '||lv_column );
    dbms_output.put_line (' --- INFO: exporting to       : DATA_PUMP_DIR ');
    -- open file for writing
    lt_file_type := utl_file.fopen ( location => 'DATA_PUMP_DIR'
                                   , filename => 'exp_'||lv_owner||'_'||lv_table||'_'||lv_column||'_'
                                                  ||TO_CHAR(sysdate,'DD.MM.YYYY HH24:MI:SS')||'.csv'
                                   , open_mode => 'w'
                                   );
    lv_statement := 'SELECT '||lv_column||' FROM '||lv_owner||'.'||lv_table;
    dbms_output.put_line (' --- INFO: statement is: '||lv_statement );
    -- prepare statement
    ln_cursor := dbms_sql.open_cursor;
    dbms_sql.parse ( c => ln_cursor
                   , statement => lv_statement
                   , language_flag => dbms_sql.v7 
                   );
    dbms_sql.define_column ( c => ln_cursor
                           , position => 1
                           , column => lv_column
                           , column_size => 4000
                           );
    ln_amount_of_rows := dbms_sql.execute ( c => ln_cursor ); 
    LOOP
      -- fetch all the rows
      IF dbms_sql.fetch_rows ( c => ln_cursor ) = 0
      THEN
        EXIT;
      ELSE
        ln_amount_of_rows := ln_amount_of_rows + 1;
        -- get the column value
        dbms_sql.column_value ( c => ln_cursor
                              , position => 1
                              , value => lv_value 
                              ); 
        -- input must be a multiple of 8 bytes, so rpadding with blanks
        dbms_obfuscation_toolkit.desencrypt ( input_string => rpad ( lv_value
                                                                   , ( trunc ( length ( lv_value ) / 8 ) + 1 ) * 8
                                                                   , chr(0)
                                                                   )
                                            , key_string => lv_seed 
                                            , encrypted_string  => lv_value
                                            );
        -- write to output file
        utl_file.put_line ( file => lt_file_type 
                          , buffer => utl_raw.cast_to_raw ( lv_value )
                          );
      END IF;
    END LOOP;
    -- close cursor
    dbms_sql.close_cursor ( c => ln_cursor );
    -- close file handle
    utl_file.fclose ( file => lt_file_type );
    dbms_output.put_line (' --- Exported '||ln_amount_of_rows||' rows !!! ');
  END IF;
END;
/

for decrypting one could use this snippet:

/**
Decrypt the encrypted data in the exported csv files
**/
SET LINES 164 VERIFY OFF PAGES 999 SERVEROUT ON FEEDBACK OFF
VARIABLE filename VARCHAR2(4000)
VARIABLE encryption_key VARCHAR2(100)
VARIABLE continue VARCHAR2(1);
EXEC dbms_output.put_line ( chr(13) );
EXEC dbms_output.put_line ( '------------------------------------------------------' );
EXEC dbms_output.put_line ( '-- INFO: Starting decryption  ');
EXEC dbms_output.put_line ( chr(13) );
/**
do some sanity checks
**/
DECLARE
  lv_directory all_directories.directory_path%TYPE;
  FUNCTION exp_directory_exists RETURN BOOLEAN
  IS
    CURSOR cur_directory
    IS SELECT directory_path
         FROM all_directories
        WHERE directory_name = 'DATA_PUMP_DIR'
    ;
  BEGIN
    OPEN cur_directory;
      FETCH cur_directory INTO lv_directory;
    CLOSE cur_directory;
    RETURN lv_directory IS NOT NULL;
  END exp_directory_exists;
BEGIN
  :filename := '&1';
  :encryption_key := '&2';
  IF ( length ( :encryption_key ) < 80 ) 
  THEN
     raise_application_error ( -20001
                             , '--- CHECK INPUT: The encryption key "'||:encryption_key||'" is too short or empty ! ');
     :continue := 'N';
  END IF;
  IF NOT ( exp_directory_exists )
  THEN
    raise_application_error ( -20001
                            , '--- CHECK DIRECTORY: Can not find directory DATA_PUMP_DIR ! ');
    :continue := 'N';
  ELSE
    dbms_output.put_line (' --- INFO: File to be loaded : '||lv_directory );
  END IF;
  :continue := 'Y';
END;
/
DECLARE
  lt_file_type utl_file.file_type;
  lv_buffer VARCHAR2(4000);
  lv_buffer_raw RAW(4000);
  lraw_decrypted_value RAW(4000);
  lv_value RAW(32767);
BEGIN
  IF ( :continue = 'Y' ) 
  THEN
    -- open file handle
    lt_file_type := utl_file.fopen ( location => 'DATA_PUMP_DIR'
                                   , filename => :filename
                                   , open_mode => 'r'
                          --         , max_linesize => 4000
                                   );
    -- read through the file line by line
    LOOP
      BEGIN
        utl_file.get_line ( file => lt_file_type
                         ,  buffer => lv_buffer
                         );
        lv_buffer_raw := lv_buffer;
        dbms_obfuscation_toolkit.desdecrypt ( input => lv_buffer_raw
                                            , key => utl_raw.cast_to_raw ( :encryption_key )
                                            , decrypted_data => lraw_decrypted_value
                                            );
        dbms_output.put_line ( 'encrypted value: '||utl_raw.cast_to_varchar2(lraw_decrypted_value) );
      EXCEPTION
        WHEN no_data_found THEN EXIT;
      END;
    END LOOP;
    -- close file handle
    utl_file.fclose ( file => lt_file_type );
  ELSE
    NULL;
  END IF;
END;
/

for encrypting and decrypting two simple calls now do the trick:

# to encrypt:
@encrypted_column_export.sql USER TABLE_NAME COLUMN_NAME ENCRYPTION_KEY
# to decrypt
@decrypt_exported_csv.sql FILE_NAME ENCRYPTION_KEY

In demonstration oracle

why not take a look at postgresql ? (9): parsing, rewriting, planning and explaining

August 2, 2012 — Leave a comment

postgresql, as well as oracle, heavily depends on accurate statistics for being able to provide the best execution plan for a given query ( both use a cost based optimizer ).

in postgresql the components involved for creating and executing the best execution plan are:

a, from an oracle perspective, special feature of the rewriter is, that you can create custom rules.
a simple example for a table containing three rows:

sysdba@[local]:1540/dbs200# create table t1 ( a integer, b char(20) );
CREATE TABLE
sysdba@[local]:1540/dbs200*# insert into t1 values ( 1, 'text1' );
INSERT 0 1
sysdba@[local]:1540/dbs200*# insert into t1 values ( 2, 'text2' );
INSERT 0 1
sysdba@[local]:1540/dbs200*# insert into t1 values ( 3, 'text3' );
INSERT 0 1
sysdba@[local]:1540/dbs200*# COMMIT;
COMMIT

let’s say the second row is such important that you do not want to allow changes to it. to achieve this you could create a trigger or you may create a rule:

sysdba@[local]:1540/dbs200# create rule myrule as on update to t1 where old.a = 2 do instead nothing;
CREATE RULE
sysdba@[local]:1540/dbs200# update t1 set b = 'blabla' where a=2;
UPDATE 0
sysdba@[local]:1540/dbs200*# select * from t1 where a=2;
 a |          b           
---+----------------------
 2 | text2               
(1 row)

such rules can be created for insert, update and delete statements and add additional conditions to statements on the tables in question. check the documentation for a complete description on this.

the optimizer in general does the same job as the optimizer in oracle does, except that you can not give hints to the optimizer in postgresql.

of course you can create indexes in postgresql for optimizing access to the data of interest. postgres provides four types of indexes:

B-tree: same as in oracle
hash: uses hash tables and is only available for “=” operations
Generalzied Search Tree ( GiST ): a special type often used for geometric or full text search purposes
Generalized Inverted Index ( GIN ): an index used for lists and arrays

a more detailed view on indexes in another post.

as mentioned above the planner/optimizer is responsible for creating the best possible execution plan. a plan internally is a tree which consists of several sub plans. as with oracle there are different choices the planner/optimizer has available:

sequential scan: read the whole table and check each row for a match, same as in oracle
indexscan: read the index for matches and go to the table for reading the row. it is important to know that postgres always has to visit the table to check if a row is allowed to see in the current transaction ( no index only access possible ). this is because postgres saves the “undo” data in the table itself but not in the index
bitmap index scan: scans the index for matches, save the results in a bitmap in memory, sorts the bitmap in the order of the table ( that is sort by block numbers ) and than reads the table. this suppresses jumping between the index and the table all the time.
nested loop join: scan the outer table and then go the inner table, same as in oracle
hash join: creates a hash table from one the of the joined tables and then uses the hash values for searching the other table(s), same as in oracle
merge join: first sorts the joined tables ( depending on the join ) and then reads the tables in parallel

in oracle you can either use the explain command, the dbms_xplan package, autotrace or give hints to the optimizer for displaying explain plans. in postgresql it’s the explain command. using the same table from above to display the explain plan for a simple select you would do:

sysdba@[local]:1540/dbs200*# explain select * from t1;
                      QUERY PLAN                      
------------------------------------------------------
 Seq Scan on t1  (cost=0.00..16.90 rows=690 width=88)
(1 row)
sysdba@[local]:1540/dbs200*#

as I know the table only contains three rows, the 690 rows reported above a far from being correct. same issue as in oracle: if the statistics are not good, the plans will not be good, too.
the cost reporting is special in postgres:
“cost=0.00..16.90” means: 0 for the start costs ( as this is a sequential scan the results can be send immediately as results are retrieved ) and 16.9 for the end costs ( that is the whole cost for executing the plan ).
the width column reports the size of one result, so in total there would be 690 rows * 88 bytes = 59840 bytes ( depending on the current statistics ).

lets check the statistics of the table:

sysdba@[local]:1540/dbs200*# select relpages,reltuples from pg_class where relname = 't1';
 relpages | reltuples 
----------+-----------
        0 |         0
(1 row)

obviously wrong, that’s why the statistics reported by explain above are wrong, too. these numbers should change if statistics are generated:

sysdba@[local]:1540/dbs200*# analyze verbose t1;
INFO:  analyzing "public.t1"
INFO:  "t1": scanned 1 of 1 pages, containing 3 live rows and 0 dead rows; 3 rows in sample, 3 estimated total rows
ANALYZE
sysdba@[local]:1540/dbs200*# select relpages,reltuples from pg_class where relname = 't1';
 relpages | reltuples 
----------+-----------
        1 |         3
(1 row)

much better. what does explain report now ?:

sysdba@[local]:1540/dbs200*#  explain select * from t1;
                    QUERY PLAN                     
---------------------------------------------------
 Seq Scan on t1  (cost=0.00..1.03 rows=3 width=25)
(1 row)

much better, too. wonder how the costs get calculated ? as in oracle oracle the costs are depended on the cost of a disk read on the cost for the cpu to process the rows. in postgresql there are two parameters which specify this:

sysdba@[local]:1540/dbs200*# show seq_page_cost;
 seq_page_cost 
---------------
 1
(1 row)
sysdba@[local]:1540/dbs200*# show cpu_tuple_cost;
 cpu_tuple_cost 
----------------
 0.01
(1 row)

the number reported here are the default values. of course you can tweak these by specifying the parameters in the server parameter file.

so the costs are: ( 1 * 1 ) + ( 3 * 0.01 ) = 1.03 ( one page read + three times 0.01 for processing the three rows ).

as plans might lie because they are based on assumptions and statistics you can use “explain analyze” ( which really executes the plan ) for comparing the calculated costs against the real costs ( in oracle you can do this by passing the gather_plan_statistics hint and calling the dbms_xplan.display function with the ‘ADVANCED’ parameter ):

sysdba@[local]:1540/dbs200*# explain analyze select * from t1;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..1.03 rows=3 width=88) (actual time=0.011..0.026 rows=3 loops=1)
 Total runtime: 0.150 ms
(2 rows)

the value reported for loops is only interesting for joins as this reports how often a particular step was executed.

what about indexes ? let’s generate some more data for the t1 table and create an index:

sysdba@[local]:1540/dbs200# insert into t1 select * from t1 where a = 1;
INSERT 0 1
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 2
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 4
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 8
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 16
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 32
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 64
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 128
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 256
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 512
sysdba@[local]:1540/dbs200*# insert into t1 select * from t1 where a = 1;
INSERT 0 1024
sysdba@[local]:1540/dbs200*# COMMIT;
COMMIT
sysdba@[local]:1540/dbs200# CREATE INDEX I1 ON T1(A);
CREATE INDEX
sysdba@[local]:1540/dbs200*# COMMIT;
COMMIT
sysdba@[local]:1540/dbs200# \d t1
         Table "public.t1"
 Column |     Type      | Modifiers 
--------+---------------+-----------
 a      | integer       | 
 b      | character(20) | 
Indexes:
    "i1" btree (a)
Rules:
    myrule AS
    ON UPDATE TO t1
   WHERE old.a = 2 DO INSTEAD NOTHING

let’s see what happens if we query for a=1 now:

sysdba@[local]:1540/dbs200*# explain analyze select * from t1 where a = 1;
                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..40.62 rows=2048 width=25) (actual time=0.030..6.873 rows=2048 loops=1)
   Filter: (a = 1)
 Total runtime: 12.110 ms
(3 rows)

as expected the index is not used ( too many rows will match the criteria a=1 ). let’s check the table statistics and see if the cost are as expected:

sysdba@[local]:1540/dbs200*# select relpages,reltuples from pg_class where relname = 't1';
 relpages | reltuples 
----------+-----------
       15 |      2050

so now there are 15 pages with 2050 rows: ( 1 * 15 ) + ( 2050 * 0.01 ) = 35.5
not really what is expected, the cost is reported higher by explain. that’s because another parameter comes into the game when there is a where clause:

sysdba@[local]:1540/dbs200*# show cpu_operator_cost;
 cpu_operator_cost 
-------------------
 0.0025
(1 row)

again, the whole table will be scanned, each result will be processed and additionally each result will be checked against the condition ( this is the cpu_operator_cost ), so:
( 1 * 15 ) + ( 2050 * 0.01 ) + ( 2050 * 0.0025 ) = 40.6250 which is almost the cost reported above.

what happens if the index will get used ?:

sysdba@[local]:1540/dbs200# explain analyze verbose select * from t1 where a = 3;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Index Scan using i1 on public.t1  (cost=0.00..8.27 rows=1 width=25) (actual time=0.020..0.023 rows=1 loops=1)
   Output: a, b
   Index Cond: (t1.a = 3)
 Total runtime: 0.060 ms
(4 rows)

this are two random reads ( one for the index one for the table ) which is 8, which comes from the random_page_cost parameter:

sysdba@[local]:1540/dbs200*# show random_page_cost;
 random_page_cost 
------------------
 4
(1 row)

so in total it is 8 plus the overhead for the cpu.
when it comes to joins the procedure is the same as for every other cost based rdbms: make sure the statements perform well ( and produce the right plan ) for every single table joined in the statement. if this is fine, the join will be fine, too.

for playing with different settings postgresql provides some parameters which can be set dynamically. but be careful: as with the hints you can give to the oracle optimizer these parameters should not be used to permanently fix your plans. if the plans are wrong, check your statistics, and even more important: know your data. know your data. know your data.

In main postgres

Daniel Westermann's Blog

Archives For July 31, 2012

passing a non default interval to a procedure

why not take a look at postgresql ? (10): hey oracle, that’s really cool

security update for postgresql released

encrypted column export to csv ?

why not take a look at postgresql ? (9): parsing, rewriting, planning and explaining

Archives

Recent Posts

RSS Links

Top Posts & Pages

Search

Follow Blog via Email

tags