Archives For November 30, 1999

a probably less known feature of the oracle database is the database filesystem (dbfs). in short dbfs provides a filesystem interface on top of database tables accessible through standard operating system commands. storing files inside the database becomes as easy as creating files on the operating system level. there is no need to use a programming language to store blobs/clobs inside the database.

for demonstrating dbfs this post will show how to create a subversion repository which actually resides in the oracle database. this is not intended for any production use and shall only show that is basically works.

because I am on linux and want to mount the database filesystem on the operating system I will need to install the kernel-devel and fuse packages and give execute privileges to the fusermount command:

su-
yum install kernel-devel fuse fuse-libs
chmod +x /bin/fusermount

next I’ll create the configuration file for fuse:

touch /etc/fuse.conf
echo "user_allow_other" > /etc/fuse.conf

the mountpoint for holding my database filesystem and subversion repository will be /oradata/DB112/svn owned by the database owner:

su - oracle
mkdir /oradata/DB112/svn

as all the files will reside inside the oracle database I will create a separate user:

su - oracle
sqlplus / as sysdba
CREATE USER svn
IDENTIFIED by "svn"
DEFAULT TABLESPACE USERS
TEMPORARY TABLESPACE TEMP;
GRANT CREATE SESSION TO svn;
GRANT dbfs_role TO svn;
GRANT RESOURCE TO svn;

the script that creates the dbfs “MYSVNFS” which will be stored in the “USERS” tablespace is:

connect svn/svn
@$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem.sql USERS MYSVNFS

the output should be similar to this:

No errors.
--------
CREATE STORE:
begin dbms_dbfs_sfs.createFilesystem(store_name => 'FS_MYSVNFS', tbl_name =>
'T_MYSVNFS', tbl_tbs => 'USERS', lob_tbs => 'USERS', do_partition => false,
partition_key => 1, do_compress => false, compression => '', do_dedup => false,
do_encrypt => false); end;
--------
REGISTER STORE:
begin dbms_dbfs_content.registerStore(store_name=> 'FS_MYSVNFS', provider_name
=> 'sample1', provider_package => 'dbms_dbfs_sfs'); end;
--------
MOUNT STORE:
begin dbms_dbfs_content.mountStore(store_name=>'FS_MYSVNFS',
store_mount=>'MYSVNFS'); end;
--------
CHMOD STORE:
declare m integer; begin m := dbms_fuse.fs_chmod('/MYSVNFS', 16895); end;
No errors.

the client program used to mount the filesystem is called dbfs_client and is located under $ORACLE_HOME/bin. the proper way in regards to security would be to use an oracle wallet for storing the password but as this is only a test I will do it the easy way ( if someone is interested on how to setup the wallet way, just let me know ):

echo "svn" > /home/oracle/passwordfile
nohup dbfs_client svn@DB112 /oradata/DB112/svn < /home/oracle/passwordfile &

if everything worked the new mount is now visible with the os commands:

df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hdc1 28G 11G 15G 43% /
tmpfs 741M 456M 286M 62% /dev/shm
dbfs-svn@DB112:/ 1.1G 160K 1.1G 1% /oradata/DB112/svn
##
ls -la /oradata/DB112/svn/MYSVNFS/
total 0
drwxrwxrwx 3 root root 0 May 4 01:10 .
drwxr-xr-x 3 root root 0 May 4 01:15 ..
drwxr-xr-x 7 root root 0 May 4 01:10 .sfs

now I am ready to create the subversion repository:

su - oracle
cd /oradata/DB112/svn/MYSVNFS/
svnadmin create MYREPO

for being able to access the repository later on from another user I’ll add the user to the subversion password file:

cat MYREPO/conf/passwd
[users]
svn = svn

… and uncomment the following line in the svnserve.conf file:

password-db = passwd

… and start the subversion deamon:

svnserve -d -r /oradata/DB112/svn/MYSVNFS

to simulate real-life subversion usage I will create a new os user and do some work on the subversion repository ( I could have done this from another workstation, too ):

su -
useradd svn
passwd svn
su - svn

the first thing I need to do is to checkout the repository:

svn co svn://localhost/MYREPO/
cd MYREPO
ls -la

lets copy some files ( text and binary ) to populate the repository ( don’t care if not all files are copied because of permissions ):

cp /etc/*.conf .
cp /bin/cpio .
ls -la

as I now have some files I’ll add them to the repository and commit ( commit means pushing to the subversion server here ):

svn add *
svn commit -m "blabla" *.conf --username=svn
svn commit -m "blabla" cpio

done. all files committed to the subversion server and stored inside the oracle database. don’t believe it ?

su - oracle
sqlplus / as sysdba
set lines 164
col object_name for a20
col object_type for a15
SELECT object_name,object_type FROM all_objects WHERE owner = 'SVN';
##
OBJECT_NAME          OBJECT_TYPE
-------------------- ---------------
T_MYSVNFS            TABLE
IG_SFS$_FST_19       INDEX
IP_SFS$_FST_19       INDEX
IPG_SFS$_FST_19      INDEX
SFS$_FSTP_19         TABLE
SYS_C005052          INDEX
##
col item for a30
col pathname for a40
select item ,PATHNAME from svn.T_MYSVNFS;
##
ITEM PATHNAME
------------------------------ ----------------------------------------
MYREPO /MYREPO
locks /MYREPO/locks
db.lock /MYREPO/locks/db.lock
db-logs.lock /MYREPO/locks/db-logs.lock
hooks /MYREPO/hooks
start-commit.tmpl /MYREPO/hooks/start-commit.tmpl
pre-commit.tmpl /MYREPO/hooks/pre-commit.tmpl
post-revprop-change.tmpl /MYREPO/hooks/post-revprop-change.tmpl
conf /MYREPO/conf
authz /MYREPO/conf/authz
db /MYREPO/db
fs-type /MYREPO/db/fs-type
revs /MYREPO/db/revs
0 /MYREPO/db/revs/0
revprops /MYREPO/db/revprops
0 /MYREPO/db/revprops/0
transactions /MYREPO/db/transactions
txn-protorevs /MYREPO/db/txn-protorevs
write-lock /MYREPO/db/write-lock
uuid /MYREPO/db/uuid
0 /MYREPO/db/revs/0/0
0 /MYREPO/db/revprops/0/0
fsfs.conf /MYREPO/db/fsfs.conf
rep-cache.db /MYREPO/db/rep-cache.db
min-unpacked-rev /MYREPO/db/min-unpacked-rev
txn-current-lock /MYREPO/db/txn-current-lock
format /MYREPO/db/format
format /MYREPO/format
passwd /MYREPO/conf/passwd
svnserve.conf /MYREPO/conf/svnserve.conf
pre-revprop-change.tmpl /MYREPO/hooks/pre-revprop-change.tmpl
pre-lock.tmpl /MYREPO/hooks/pre-lock.tmpl
pre-unlock.tmpl /MYREPO/hooks/pre-unlock.tmpl
ROOT /
.sfs /.sfs
attributes /.sfs/attributes
tools /.sfs/tools
snapshots /.sfs/snapshots
RECYCLE /.sfs/RECYCLE
content /.sfs/content
post-commit.tmpl /MYREPO/hooks/post-commit.tmpl
post-lock.tmpl /MYREPO/hooks/post-lock.tmpl
post-unlock.tmpl /MYREPO/hooks/post-unlock.tmpl
README.txt /MYREPO/README.txt
1 /MYREPO/db/revs/0/1
txn-current /MYREPO/db/txn-current
2 /MYREPO/db/revs/0/2
1 /MYREPO/db/revprops/0/1
current /MYREPO/db/current
2 /MYREPO/db/revprops/0/2

that’s the subversion repository stored in the oracle database.

remember this is only for educational purposes, but imagine what you could do with it:

  • back up the repository with rman
  • use rman to duplicate the database for testing purposes
  • setup rac one node, integrate the subversion daemon with the oracle grid infrastructure and make the repository high available
  • create multiple dbfs and multiple svn repositories in one oracle database
  • enable deduplication for the repository files ( see * )
  • enable compression for the repository files ( see ** )

oracle uses secure files to store the files located in the dbfs. this means you may use all the features secure files provide. check the examples for an overview.

*

alter table svn.T_MYSVNFS modify lob(FILEDATA) (deduplicate);

**

alter table svn.T_MYSVNFS modify lob(FILEDATA) (compress high);

as said above, this is only educational but i’d be happy to start a discussion about the pros and cons ( beside the licensing costs, of course ) …. what would be a good use case to use dbfs for ?

when it comes to the grid infrastructure ( single node or cluster configuration ) oracle needs to store and manage some configuration data about the state of the node(s) and its resources. each node in the configuration has its own container to store and manage data which is called the “oracle local registry (olr)”.

by using the ocrcheck utility one can check the current the status of the local registry:

ocrcheck -local
Status of Oracle Local Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2472
Available space (kbytes) : 259648
ID : 1322090758
Device/File Name : /opt/oracle/product/crs/11.2.0.3/cdata/localhost/oracleplayground.olr
Device/File integrity check succeeded
Local registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user

beside some other information this tells that the local registry is located here:

/opt/oracle/product/crs/11.2.0.3/cdata/localhost/oracleplayground.olr

if you wonder about the “Logical corruption check bypassed due to non-privileged user” message, this is because the ocrcheck command was executed as the grid infrastructure software owner. re-executing the same command as root user will show “Logical corruption check succeeded”. why root? because some parts of the grid infrastructure run with root privileges ( remember the roothas.pl or rootcrs.pl setup script ).

as the local registry is essential for the cluster stack to start up there must be a way to do proper backups and restores of it in case it gets lost or corrupted. oracle creates an initial backup of the olr after the configuration of the grid infrastructure. from that point onwards it is up to you to create and manage the backups. you will find the initial backup of your olr under ORACLE_HOME/cdata/[HOSTNAME]/, in my case:

ls -al /opt/oracle/product/crs/11.2.0.3/cdata/oracleplayground/
total 5644
drwxr-x--- 2 grid oinstall 4096 Mar 23 09:18 .
drwxrwx--- 4 grid oinstall 4096 Mar 23 09:17 ..
-rw------- 1 grid oinstall 5746688 Mar 23 09:18 backup_20120323_091815.olr

first of all one should decide where to store the manual backups of the olr. the default location is fine if you do regular backups of your $ORACLE_HOME ( and it is strongly recommended to do so ). to list the current configuration use:

./ocrconfig -local -showbackup
oracleplayground 2012/03/23 09:18:15 /opt/oracle/product/crs/11.2.0.3/cdata/oracleplayground/backup_20120323_091815.olr

if you need to change the location for any reason, you can do so by using the ocrconfig command:

./ocrconfig -local -backuploc [BACKUP_DESTINATION]

doing backups of your olr is as easy as:

/ocrconfig -local -manualbackup
oracleplayground 2012/05/03 15:51:38 /opt/oracle/product/crs/11.2.0.3/cdata/oracleplayground/backup_20120503_155138.olr
oracleplayground 2012/03/23 09:18:15 /opt/oracle/product/crs/11.2.0.3/cdata/oracleplayground/backup_20120323_091815.olr

…which will also list the current backups of the olr. of course you need to ensure that these files are being backuped up to a safe location ( to a tape, for example ).

another way for doing backups and restores is to use the “import” and “export” parameters of the ocrconfig command:

./ocrconfig -local -export /tmp/olr.dmp
./ocrconfig -local -import /tmp/olr.dmp

as oracle recommends not to use the “export” and “import” procedures for backups and restores i wonder what these parameters are for.
perhaps you should do both kinds of backups to be sure that one of it really works :)

in case you need to restore a backup of the local registry make sure you stop the cluster stack before:

crsctl stop has
ocrconfig -local -restore [OLR_BACKUP_FILE]
ocrcheck -local
crsctl start has

as it should be clear now how to backup and restore the local registry lets take look at the contents. oracle provides another command for dumping the contents of the local ( or cluster ) registry:

./ocrdump -local -stdout

this will dump the contents of the local registry to the screen. take a look at the output and you will see why the registry is essential for the stack. if you registered a database and listener with the grid infrastructure you will see something like this:

[SYSTEM.OHASD.RESOURCES.ora!db112!db]
[SYSTEM.OHASD.RESOURCES.ora!db112!db.CONFIG]
ORATEXT : ACL=owner:grid:rwx,pgrp:asmdba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx~ACTION_FAILURE_TEMPLATE=~ACTION_SCRIPT=~ACTIVE_PLACEMENT=1~AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%~AUTO_START=restore~BASE_TYPE=ora.cluster_resource.type~CARDINALITY=1~CHECK_INTERVAL=1~CHECK_TIMEOUT=30~CLUSTER_DATABASE=false~DATABASE_TYPE=SINGLE~DB_UNIQUE_NAME=DB112~DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME
%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%) ELEMENT(DATABASE_TYPE= %DATABASE_TYPE%)~DEGREE=1~DESCRIPTION=Oracle Database resource~ENABLED=1~FAILOVER_DELAY=0~FAILURE_INTERVAL=60~FAILURE_THRESHOLD=1~GEN_AUDIT_FILE_DEST=/oradata/DB112/admin/adump~GEN_START_OPTIONS=open~GEN_USR_ORA_INST_NAME=DB112~HOSTING_MEMBERS=~INSTANCE_FAILOVER=1~LOAD=1~LOGGING_LEVEL=1~MANAGEMENT_POLICY=AUTOMATIC~NAME=ora.db112.db~NLS_LANG=~NOT_RESTARTING_TEMPLATE=~OFFLINE_CHECK_INTERVAL=0~ONLINE_RELOCATI
ON_TIMEOUT=0~ORACLE_HOME=/opt/oracle/product/base/11.2.0.3~ORACLE_HOME_OLD=~PLACEMENT=balanced~PROFILE_CHANGE_TEMPLATE=~RESTART_ATTEMPTS=2~ROLE=PRIMARY~SCRIPT_TIMEOUT=60~SERVER_POOLS=~SPFILE=/opt/oracle/product/base/11.2.0.3/dbs/spfileDB112.ora~START_DEPENDENCIES=weak(type:ora.listener.type,uniform:ora.ons) hard(ora.DB112_DATA_DG.dg,ora.DB112_ARCH_DG.dg) pullup(ora.DB112_DATA_DG.dg,ora.DB112_ARCH_DG.dg)~START_TIMEOUT=600~STATE_CHANGE_TEMPLATE=~STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DB112_D
ATA_DG.dg,shutdown:ora.DB112_ARCH_DG.dg)~STOP_TIMEOUT=600~TYPE=ora.database.type~TYPE_ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--~TYPE_NAME=ora.database.type~TYPE_VERSION=3.2~UPTIME_THRESHOLD=1h~USR_ORA_DB_NAME=~USR_ORA_DOMAIN=~USR_ORA_ENV=~USR_ORA_FLAGS=~USR_ORA_INST_NAME=DB112~USR_ORA_OPEN_MODE=open~USR_ORA_OPI=false~USR_ORA_STOP_MODE=immediate~VERSION=11.2.0.3.0~

everything the cluster stack needs to know about the registered resource is recorded here. this includes all the dependencies of the resource as well as the different kinds of parameters, e.g. :

SPFILE=/opt/oracle/product/base/11.2.0.3/dbs/spfileDB112.ora

… which tells where to find the spfile of the registered database, or:

hard(ora.DB112_DATA_DG.dg,ora.DB112_ARCH_DG.dg)

… which tells that the diskgroups DB112_DATA_DG and DB112_ARCH_DG must be available before the database can start up. this is a hard dependency, so the startup of the database will stop if the diskgroups are not available.

you can get a more readable format by dumping the contents to a xml file:

./ocrdump -local /tmp/olr.xml -xml

as the backups of the olr are the same format as the current olr, ocrdump works the same way if you want to dump the backups. this can help when searching for changes in the olr:

./ocrdump -local /tmp/olr_backup.xml -backupfile /opt/oracle/product/crs/11.2.0.3/cdata/oracleplayground/backup_20120503_155138.olr -xml

you may have observed, that all the commands ( ocrconfig, orcdump, ocrcheck ) are used to manage the local registry as well as the cluster registry in case you have a clustered setup ( RAC or RAC one node, for example ). so, no matter if you are on a single node or cluster configuration: make sure your backup and restore strategy includes the oracle local and cluster registry.

what happens when you update a row of a table ? not much, you could think. but in fact there are lots and lots of things happening in the background from which you can learn how the database works. this post will follow a simple one row update and shows what’s happening behind the scenes.

i will start by creating a tiny little table and adding one row to it:

CREATE TABLE T1 ( A NUMBER )
       TABLESPACE USERS;
INSERT INTO T1 (A)
       VALUES (1);
COMMIT;

this is the starting point. because we will need this information later lets record four things:
1) the block number of the row we just created
2) the object id of the table in the data dictionary
3) the sid of the session
4) the session’s address

-- rowid of the row
SELECT rowid
  FROM t1
 WHERE A = 1;
ROWID
------------------
AAADb4AAEAAAAF9AAA
-- block number of the row
SELECT dbms_rowid.rowid_block_number('AAADb4AAEAAAAF9AAA') block_no
  FROM dual;
BLOCK_NO
----------
381
-- object id of the table
SELECT object_id
  FROM all_objects
 WHERE owner = USER
   AND object_type = 'TABLE'
   AND object_name = 'T1';
OBJECT_ID
----------
14072
-- the session id of the session which sends the update
SELECT SYS_CONTEXT('userenv','SID') sid
  FROM dual;
SID
---
32
-- the same sessions address
SELECT saddr
  FROM v$session
 WHERE sid = 32;
SADDR
----------------
000000006F680A28

so by now we know the object id ( which is 14072 ), the block number of the block containing the row ( which is 381 ), the session id ( which is 32 in my case ) and the session’s address ( which is 000000006F680A28 ).

ready to update ?

UPDATE T1
SET A = 2
WHERE A = 1;
1 row updated.

at this point it is important to leave the transaction open ( to not commit ). i will start a second sqlplus session and leave the session which issues the update as is.

the obvious things first. each insert/update/delete will start a transaction and as we did not commit our update we should be able to see the transaction:

SELECT addr
, ubafil
, ubablk
, ubarec
, status
, xid
, start_scn
FROM v$transaction
WHERE ses_addr = '000000006F680A28';
ADDR UBAFIL UBABLK UBAREC STATUS XID START_SCN
---------------- ---------- ---------- ---------- ------ ---------------- ----------
000000006DF332D8 3 483 11 ACTIVE 0100100022010000 958658

as expected there is an active transaction for our update statement. because each transaction will change some data ( if there is not rollback ) some undo must be generated. from the output above we can learn:
1. the datafile containing the undo block has file number 3
2. the block containing the undo is block number 483
3. the undo record is 11

before taking a look at the undo block lets see what information oracle has about our data block in v$bh ( which lists the buffer headers ):

col dirty for a5
col temp for a5
col ping for a5
col stale for a5
col stale for a5
col direct for a5
col status for a6
col objd for 99999
col block# for 999999
SELECT file#
, status
, dirty
, temp
, ping
, stale
, direct
, objd
, block#
FROM v$bh
WHERE objd = 14072
AND block# = 381;
FILE# STATUS DIRTY TEMP PING STALE DIREC OBJD BLOCK#
---------- ------ ----- ----- ----- ----- ----- ------ -------
4 cr N N N N N 14072 381
4 xcur N N N N N 14072 381

so v$bh reports two copies of the block header, one in “consitent read” the other in “exclusive” mode non of them dirty, temporary, pinged, stale or direct.

the exclusive one is the original block while the cr one is the copy we got when we issued the update. if you’d do a rollback of the update and resend the same update statement you’d receive another “cr” copy of the block. can you imagine why ? because readers do not block writers and writers do not block readers. for others sessions to be able to view the block as it really is ( the update is not yet commited ) oracle creates copies of blocks to construct a read consistant view of the data. you should keep this in mind when sizing your pools.

happily oracle provides a way to look at the contents of a block. lets do a dump of our undo block:

ALTER SESSION SET TRACEFILE_IDENTIFIER='my_dumps';
ALTER SYSTEM DUMP DATAFILE 3 BLOCK 483;

as the select from v$transaction tells us that our undo record is 11, we will take a look at this record only. if you open the trace file scroll down to your undo record you should see a similar output than this:

*-----------------------------
* Rec #0xb slt: 0x10 objn: 14072(0x000036f8) objd: 14072 tblspc: 4(0x00000004)
* Layer: 11 (Row) opc: 1 rci 0x00
Undo type: Regular undo Begin trans Last buffer split: No
Temp Object: No
Tablespace Undo: No
rdba: 0x00000000Ext idx: 0
flg2: 0
*-----------------------------
uba: 0x00c001e3.0036.09 ctl max scn: 0x0000.000e4630 prv tx scn: 0x0000.000e4631
txn start scn: scn: 0x0000.000ea07f logon user: 42
prev brb: 12583394 prev bcl: 0
KDO undo record:
KTB Redo
op: 0x03 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: Z
Array Update of 1 rows:
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: 0
KDO Op code: 21 row dependencies Disabled
xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x0100017d hdba: 0x0100017a
itli: 2 ispac: 0 maxfr: 4858
vect = 0
col 0: [ 2] c1 02

if you wonder why this is record 11, “#0xb” translates to 11:

select to_number('b','X') dec
from dual;
DEC
----------
11

what can we learn from here:
1) we can see the tablespace number and the object which is affected:

* Rec #0xb slt: 0x10 objn: 14072(0x000036f8) objd: 14072 tblspc: 4(0x00000004)

2) the start scn ( system change number ) is recorded as long with the user who started the transaction

txn start scn: scn: 0x0000.000ea07f logon user: 42

000e2f90 translates to 929670 and 42 is the user_id reported in all_users for the user who started the transaction.
3) one row will be updated:

Array Update of 1 rows:
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: 0
KDO Op code: 21 row dependencies Disabled
xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x0100017d hdba: 0x0100017a
itli: 2 ispac: 0 maxfr: 4858
vect = 0
col 0: [ 2] c1 02

other people described the meanings of all the keywords and flags, so i will not just re-type them here ( you may, for example, want to check julian dyke’s website ).
for now, it is just important to notice that the first column of the table ( col 0 ) will be changed by 2 bytes ( [2] ) and the values is 1 ( 02 – 1 ).  oracle prefixes numeric values ( c1 ), so we may ignore the first 2 bytes.

what else is happening in the background? as stated in earlier posts, things need protections. lets see if we can find some locks for our session:

SELECT sid
, id1
, id2
, lmode
, request
FROM v$lock
WHERE sid = 32;
SID ID1 ID2 LMODE REQUEST
---------- ---------- ---------- ---------- ----------
32 100 0 4 0
32 14072 0 3 0
32 65552 290 6 0

the lmodes translate to: 4 = Edition enqueue (AE), 3 = DML (TM), 6 = Transaction (TX).

i am not sure, but i think the edition enqueue lock is always present ( happy if someone can tell more about this ).
our update triggered two locks, one for the dml and one for the transaction.
column id1 of the dml references our table, which is object 14072. we locked one row in shared exclusive mode.
column id2 of the transaction lock references the sequence of the transaction ( v$transaction.xidsqn ). so, the other way around, you can check the locks for your session and from there get to the object and transaction.

in my case, the redo containing the update must be the current one ( as nothing else is happening on my database ). so i’ll find one copy of it ( as the logs are mirrored ):

select GROUP#
, THREAD#
, SEQUENCE#
, STATUS
from v$log;
GROUP# THREAD# SEQUENCE# STATUS
---------- ---------- ---------- ---------------
1 1 85 INACTIVE
2 1 86 INACTIVE
3 1 87 CURRENT
select member
from v$logfile
where group# = 3;
MEMBER
------------------------------------------------------------------------------------------------------------------------
+DB112_DATA_DG/db112/onlinelog/group_3.259.779116045
+DB112_ARCH_DG/db112/onlinelog/group_3.259.779116047

… dump the first one:

alter system dump logfile '+DB112_DATA_DG/db112/onlinelog/group_3.259.779116045';

… and you will find the redo record:

REDO RECORD - Thread:1 RBA: 0x000057.000000a2.0010 LEN: 0x01fc VLD: 0x0d
SCN: 0x0000.000ea149 SUBSCN: 1 04/24/2012 20:10:42
(LWN RBA: 0x000057.000000a2.0010 LEN: 0002 NST: 0001 SCN: 0x0000.000ea149)
CHANGE #1 TYP:2 CLS:1 AFN:4 DBA:0x0100017d OBJ:14072 SCN:0x0000.000ea0c2 SEQ:1 OP:11.19 ENC:0 RBL:0
KTB Redo
op: 0x11 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F xid: 0x0001.010.00000122 uba: 0x00c001e3.0036.0b
Block cleanout record, scn: 0x0000.000ea148 ver: 0x01 opt: 0x02, entries follow...
itli: 1 flg: 2 scn: 0x0000.000ea0c2
Array Update of 1 rows:
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 2 ckix: 0
ncol: 1 nnew: 1 size: 0
KDO Op code: 21 row dependencies Disabled
xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x0100017d hdba: 0x0100017a
itli: 2 ispac: 0 maxfr: 4858
vect = 0
col 0: [ 2] c1 03
CHANGE #2 TYP:0 CLS:17 AFN:3 DBA:0x00c00080 OBJ:4294967295 SCN:0x0000.000ea097 SEQ:1 OP:5.2 ENC:0 RBL:0
ktudh redo: slt: 0x0010 sqn: 0x00000122 flg: 0x0012 siz: 164 fbi: 0
uba: 0x00c001e3.0036.0b pxid: 0x0000.000.00000000
CHANGE #3 TYP:0 CLS:18 AFN:3 DBA:0x00c001e3 OBJ:4294967295 SCN:0x0000.000ea096 SEQ:2 OP:5.1 ENC:0 RBL:0
ktudb redo: siz: 164 spc: 6734 flg: 0x0012 seq: 0x0036 rec: 0x0b
xid: 0x0001.010.00000122
ktubl redo: slt: 16 rci: 0 opc: 11.1 [objn: 14072 objd: 14072 tsn: 4]
Undo type: Regular undo Begin trans Last buffer split: No
Temp Object: No
Tablespace Undo: No
0x00000000 prev ctl uba: 0x00c001e3.0036.09
prev ctl max cmt scn: 0x0000.000e4630 prev tx cmt scn: 0x0000.000e4631
txn start scn: 0x0000.000ea07f logon user: 42 prev brb: 12583394 prev bcl: 0 BuExt idx: 0 flg2: 0
KDO undo record:
KTB Redo
op: 0x03 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: Z
Array Update of 1 rows:
tabn: 0 slot: 0(0x0) flag: 0x2c lock: 0 ckix: 0
ncol: 1 nnew: 1 size: 0
KDO Op code: 21 row dependencies Disabled
xtype: XAxtype KDO_KDOM2 flags: 0x00000080 bdba: 0x0100017d hdba: 0x0100017a
itli: 2 ispac: 0 maxfr: 4858
vect = 0
col 0: [ 2] c1 02

what can we see here:
1) there is our update “col 0: [ 2] c1 03”: change the first column of our table to 2 ( 03 – 1 ).
2) there is the same undo entry as in the undo block a little later in the trace: “col 0: [ 2] c1 02”.

why is there the same entry in the redo as we found it in the undo ? remember that we did not yet finish our transaction ( the commit is still pending ). so what will happen if the instance crashes now? when the instance comes up again after the crash oracle will scan the redo logs and apply all the changes that happened up to before the crash. and as we did not commit the transaction oracle must do a rollback, and that’s what the redo entry is about ( redo the undo ).

lets commit the update in the initial session and check what changed.

commit;

what happened to our transaction:

SELECT addr
, ubafil
, ubablk
, ubarec
, status
, xid
, start_scn
FROM v$transaction
WHERE ses_addr = '000000006F680A28';
no rows selected

as expected, the transaction finished.

our locks?

SELECT sid
, id1
, id2
, lmode
, request
FROM v$lock
WHERE sid = 32;
SID TYPE ID1 ID2 LMODE REQUEST
SID ID1 ID2 LMODE REQUEST
---------- ---------- ---------- ---------- ----------
32 100 0 4 0

gone, too.

the buffer headers ?

SELECT file#
, status
, dirty
, temp
, ping
, stale
, direct
, objd
, block#
FROM v$bh
WHERE objd = 14072
AND block# = 381;
FILE# STATUS DIRTY TEMP PING STALE DIREC OBJD BLOCK#
---------- --------------- ----- ----- ----- ----- ----- ------ -------
4 cr N N N N N 14072 381
4 xcur Y N N N N 14072 381

the dirty flag for our buffer changed to ‘Y’, so there was some modification. obvious, as we changed the row.

just for completion: if you do a rollback instead of the commit you’d get another redo entry describing the rollback.

again this is far from being complete but should give you some hints on what is happening behind the scenes. and, even more important: every change to the database is recorded in the log files ( and this includes undo generation ).

Ouch! tns poison

April 29, 2012 — Leave a comment

there is a very old critical security hole regarding the oracle database. be aware of it, as there is no fix for it currently:

german

english

update: official workaround available now

if you ever wondered where oracle stores the statistic preferences one can set by using the dbms_stats.set_*_prefs procedures here you go:

global statistic preferences

table which holds the values: sys.optstat_hist_control$

example script to query the settings:

COLUMN SNAME FOR a30;
COLUMN SVAL1 FOR 9999;
COLUMN SVAL2 FOR a20;
COLUMN SPARE1 FOR 999999;
COLUMN SPARE2 FOR 999;
COLUMN SPARE3 FOR 999999;
COLUMN SPARE4 FOR a30;
COLUMN SPARE5 FOR a10;
COLUMN SPARE6 FOR a10;
SELECT sname
, sval1
, to_char(sval2,'DD.MM.YYYY HH24:MI:SS') sval2
, spare1
, spare2
, spare3
, spare4
, spare5
, spare6
FROM sys.optstat_hist_control$
ORDER BY 1
;

procedure which affects the table: dbms_stats.set_global_prefs, e.g.:

exec dbms_stats.set_global_prefs ( 'ESTIMATE_PERCENT', '11');

non-global ( user ) statistic preferences

table which holds the values: sys.optstat_user_prefs$

example script to query the settings:

COLUMN owner FOR a20;
COLUMN object_name FOR a20;
COLUMN object_type FOR a20;
COLUMN pname FOR a30;
COLUMN valnum FOR 99999;
COLUMN valchar FOR a30;
COLUMN chgtime FOR a20;
COLUMN spare1 FOR 999;
SELECT do.owner
, do.object_name
, do.object_type
, o.pname
, o.valnum
, o.valchar
, to_char(o.chgtime,'DD.MM.YYYY HH24:MI:SS') chgtime
, o.spare1
FROM sys.optstat_user_prefs$ o
, dba_objects do
WHERE o.obj# = do.object_id
;

procedures which affects the table: dbms_stats.set_global_prefs, dbms_stats.set_schema_prefs, dbms_stats.set_database_prefs,e.g.:

exec dbms_stats.set_table_prefs ( USER,'T1','GRANULARITY','ALL');
exec dbms_stats.set_schema_prefs ( USER,'ESTIMATE_PERCENT','11' );
exec dbms_stats.set_database_prefs ( 'ESTIMATE_PERCENT', '12', FALSE );

when working with the oracle database sooner or later you will face the ORA-07445 error reported in a trace file and the alertlog.

there is plenty of documentation about this error on oracle support and the web. in short: this is an unhandled exception in the oracle code ( in contrast the ORA-00600 errors are handled exceptions ). so why do I want to write about it ? because this is another example where you can map database behavior to the operating system. one common type of the ORA-07445 is this one: “type: SIGSEGV”.

what is this about and what does it stand for ?

software contains bugs. this is true for the linux kernel, this is true for the oracle database and this is true for probably all other software. to deal with unexpected behavior and to protect the system there must be some some sort of exception handler which processes/catches the exceptions once they occur and does the necessary steps to recover from the exceptions. the lowest level exceptions are raised by the CPU and must be handled by the operating system’s kernel. these exception are predefined and the kernel provides an exception handler for each of them.

some of them are:

  • division by zero
  • segment not present
  • stack segment fault
  • invalid opcode

you can, for example, check the intel documentation for a complete list of defined exceptions.

when an exception is raised the corresponding exception handler sends a signal to the process which caused the exception. and this is exactly what the SIGSEGV is: it is a signal. signals, for example ( the following list is not complete ), can be:

  • SIGSEGV: page faults, overflows
  • SIGFPE: divide error
  • SIGBUS: stack segments fault
  • SIGILL: invalid opcode

most of these exceptions can only occur when the kernel is in user mode, that is, when executing tasks from user programs ( oracle in this case ). there are two ways in which the processor can halt ( or interrupt ) process execution:

  • interrupts, which are ansynchron and typically triggered by I/O devices
  • exceptions, which are synchron and triggered by the processor when it detects predefined conditions while executing

when the processor halts process execution it switches to the handler routine ( each routine is defined in the interrupt description table, IDT ). once the handler routine has executed its tasks control is given back to the interrupted process.

unfortunately there is not much you can do about it. you can try to find a workaround with oracle support ( e.g. by setting some database parameters or applying a patch ) or check the generated dumps to get some hints on what exactly caused the exception.

a recent search on oracle support returned about 2500 results for the term SIGSEGV. you see, this is not an unusual signal …

this post will continue the introduction post to linux processes and shows how the listener handles connection requests to the database.

let’s check the listeners pid:

ps -ef | grep tns
grid 2646 1 0 10:47 ? 00:00:03 /opt/oracle/product/base/11.2.0.3/bin/tnslsnr LISTENER_DB112 -inherit

as the listener needs to handle connections there must be some sorts of open files for the listener process:

[root@oracleplayground ~]# ls -la /proc/2646/fd/
total 0
dr-x------ 2 grid oinstall 0 Apr 4 10:47 .
dr-xr-xr-x 6 grid oinstall 0 Apr 4 10:47 ..
lrwx------ 1 grid oinstall 64 Apr 4 15:29 0 -> /dev/null
lrwx------ 1 grid oinstall 64 Apr 4 15:29 1 -> /dev/null
lrwx------ 1 grid oinstall 64 Apr 4 15:29 10 -> socket:[7875]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 11 -> socket:[7877]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 12 -> socket:[7949]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 13 -> socket:[7950]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 15 -> socket:[12719]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 2 -> /dev/null
lr-x------ 1 grid oinstall 64 Apr 4 15:29 3 -> /opt/oracle/product/base/11.2.0.3/rdbms/mesg/diaus.msb
lr-x------ 1 grid oinstall 64 Apr 4 15:29 4 -> /proc/2646/fd
lr-x------ 1 grid oinstall 64 Apr 4 15:29 5 -> /opt/oracle/product/base/11.2.0.3/network/mesg/nlus.msb
lr-x------ 1 grid oinstall 64 Apr 4 15:29 6 -> pipe:[7822]
lr-x------ 1 grid oinstall 64 Apr 4 15:29 7 -> /opt/oracle/product/base/11.2.0.3/network/mesg/tnsus.msb
lrwx------ 1 grid oinstall 64 Apr 4 15:29 8 -> socket:[7873]
l-wx------ 1 grid oinstall 64 Apr 4 15:29 9 -> pipe:[7823]
[root@oracleplayground ~]# 

this tells us that the listener has 15 open file descriptors of which 8,10-15 are sockets and 9 is a pipe.

let’s request a connection to the database through the listener and trace the listener process in parallel:

Session 1 ( as root ):

strace -p 2646

Session 2 ( as oracle ):

sqlplus a/a@DB112

the strace output ( i have number the lines ) should look similar to this:

1: poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=13, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=-1}, {fd=-1}], 7, -1) = 1 ([{fd=13, revents=POLLIN|POLLRDNORM}])
2: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191568
3: getsockname(13, {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr("127.0.0.1")}, [11087335753955409936]) = 0
4: getpeername(13, 0x7fffc6b5d1a8, [11087335753955409936]) = -1 ENOTCONN (Transport endpoint is not connected)
5: accept(13, {sa_family=AF_INET, sin_port=htons(23139), sin_addr=inet_addr("127.0.0.1")}, [68719476752]) = 14
6: getsockname(14, {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr("127.0.0.1")}, [68719476752]) = 0
7: fcntl(14, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
8: getsockopt(14, SOL_SOCKET, SO_SNDBUF, [-4128159149999471140], [4]) = 0
9: getsockopt(14, SOL_SOCKET, SO_RCVBUF, [-4128159149999434336], [4]) = 0
10: setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0
11: fcntl(14, F_SETFD, FD_CLOEXEC) = 0
12: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191568
13: rt_sigaction(SIGPIPE, {0x1, ~[ILL ABRT BUS FPE SEGV USR2 XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x32e840ebe0}, {0x1, ~[ILL ABRT BUS FPE KILL SEGV USR2 STOP XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x32e840ebe0}, 8) = 0
14: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191568
15: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191568
16: poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=13, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=14, events=POLLIN|POLLRDNORM}, {fd=-1}], 7, 60000) = 1 ([{fd=14, revents=POLLIN|POLLRDNORM}])
17: read(14, "\335\1\1:\1,\fA \377\377\177\10\1\243:\10"..., 8208) = 221
18: fcntl(14, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
19: fcntl(14, F_SETFL, O_RDWR) = 0
20: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191569
21: fcntl(14, F_SETFD, 0) = 0
22: pipe([16, 17]) = 0
23: pipe([18, 19]) = 0
24: clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2ac09be5bb80) = 5894
25: wait4(5894, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 5894
26: close(16) = 0
27: close(19) = 0
28: fcntl(18, F_SETFD, FD_CLOEXEC) = 0
29: fcntl(17, F_SETFD, FD_CLOEXEC) = 0
30: fcntl(14, F_SETFD, FD_CLOEXEC) = 0
31: poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=13, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=18, events=POLLIN|POLLRDNORM}, {fd=17, events=0}], 7, -1) = 1 ([{fd=18, revents=POLLIN|POLLRDNORM}])
32: read(18, "NTP0 5895\n", 64) = 10
33: write(17, ";", 4) = 4
34: write(17, "(ADDRESS=(PROTOCOL=tcp)(DEV=14)("..., 59) = 59
35: write(17, "\1\4", 8) = 8
36: read(18, "", 4) = 4
37: read(18, "*1", 4) = 4
38: write(14, "\10\v", 8) = 8
39: close(17) = 0
40: close(18) = 0
41: close(14) = 0
42: lseek(7, 19968, SEEK_SET) = 19968
43: read(7, "\f005\4P006\4j007\4\206008\4\240009\4\335"..., 512) = 512
44: poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=13, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=-1}, {fd=-1}], 7, -1) = 1 ([{fd=15, revents=POLLIN|POLLRDNORM}])
45: Process 2646 detached

a lot of cryptic output, isn’t it? let’s take a closer look on what’s happening:

on line one you can see a call to “poll”. poll waits for some events on the file descriptors. as connections are files, you may say it waits for some sort of connections.

on line two you can see a call to times, which returns several process times.

on line three there is a call to getsockname which returns the socket name for this address: {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr(“127.0.0.1”)}, which is
the listener on port 1521 ( sa_family: this is the address family, AF_INET is the one used for IP, sin_port: is the port, sin_addr: is the address, in this case localhost ).

the call on line four ( getpeername ) does what it is called: get the name of the connected peer. it returns with ENOTCONN ( the socket is not connected ). next, one line five, a connection to the socket is accepted. notice that a new file descriptor is created ( 14 ) and passed to the getsockname call on line six. getsockname returns with code 0, which is success. now there is a connected endpoint which is our request to the listener.

the call to fcntl on line 7 modifies the file descriptor: set the file status flags ( F_SETFL ) to read only ( O_RDONLY ) and block system calls ( O_NONBLOCK ).

on line 8 and 9 the values of the sockets receive and sent buffers are read and on line 10 the TCP option TCP_NODELAY ( which disables Nagle’s algorithm ) is set.

next, on line 11, the socket is modified to close when a call to an exec function is performed. from now on I will ignore the calls to times, as I have described it above.

on line 13 rt_sigaction changes the action taken by the process on receipt of a signal and line 17 reads from the file associated with the file
descriptor ( ignore line 14,15,16 ). line 18 again reads the file descriptor flags and line 19 sets the flag for read/write ( O_RDWR ) and line 21
resets the file descriptor flags to the defaults.

line 22 and 23 create a pair of file descriptors pointing to a pipe inode ( the first is for reading, the second for writing ). the clone call on line 24 does interesting stuff, it creates a new process with is 5894 and line 25 waits for the new process to change its state ( the child_stack=0 indicates that a process is created, not a thread ), in other words it waits for the process to exit.

if we now do a check if the process is there, you will notice that no process with this PID exists ( this is probably because it is the PID of the clone itself ):

ps -ef | grep 5894
root 5984 5611 0 15:58 pts/3 00:00:00 grep 5894

but what exists, is PID+1 ( which you can see on line 32 ):

ps -ef | grep 5895
oracle 5895 1 0 15:44 ? 00:00:00 oracleDB112 (LOCAL=NO)
root 5997 5611 0 16:00 pts/3 00:00:00 grep 5895

… which is our connection to the database. if you check this process you will see that the socket 14 is now available in the newly created process:

ls -la /proc/5895/fd
total 0
dr-x------ 2 root root 0 Apr 4 15:44 .
dr-xr-xr-x 6 oracle asmadmin 0 Apr 4 15:44 ..
lr-x------ 1 root root 64 Apr 4 16:00 0 -> /dev/null
l-wx------ 1 root root 64 Apr 4 16:00 1 -> /dev/null
lrwx------ 1 root root 64 Apr 4 16:00 14 -> socket:[102504]
l-wx------ 1 root root 64 Apr 4 16:00 2 -> /dev/null
lr-x------ 1 root root 64 Apr 4 16:00 3 -> /dev/null
lr-x------ 1 root root 64 Apr 4 16:00 4 -> /dev/null
lr-x------ 1 root root 64 Apr 4 16:00 5 -> /opt/oracle/product/base/11.2.0.3/rdbms/mesg/oraus.msb
lr-x------ 1 root root 64 Apr 4 16:00 6 -> /proc/5895/fd
lr-x------ 1 root root 64 Apr 4 16:00 7 -> /dev/zero

the remaining lines will close some files ( including the listeners file descriptor 14 ) and write some data. i will ignore the rest of the output as it should be clear now, how the listener hands off the connections to the database: it listens to incoming requests on the defined port and creates a new process which is the database connection. that’s all the listener does. once the connection is established there is no more work to do for the listener and it looses control of the newly created process.

just one more thing of interest: the lseek and read calls to file descriptor 7 ( lines 42 and 43 ) are positioning and reading the file containing the tns messages. you can check this with:

strings /opt/oracle/product/base/11.2.0.3/network/mesg/tnsus.msb

this are the messages the listener returns.

happy listening …

if you have access to oracle support there is a good note about deploying the database on linux.

check it out:

Master Note of Linux OS Requirements for Database Server

To summarize all the posts which will get you to your test infrastructure, here is the overview:

  1. Operating System setup
  2. Operating System preperation for oracle
  3. Grid Infrastrucuture Installation
  4. Applying the Latest Patchset Update for the Grid Infrastructure
  5. Creating the ASM instance
  6. Database Software Installation
  7. Creating the database
  8. Applying the Latest Patchset Update for the database