Archives For processes

if you are not sure what these abbreviations all stand for:

demo@orademo.local oracle:/home/oracle/adhocScripts $ ps -ef | grep ora_
oracle    2016     1  0 15:33 ?        00:00:28 ora_pmon_demo
oracle    2018     1  0 15:33 ?        00:00:49 ora_psp0_demo
oracle    2020     1 15 15:33 ?        00:39:48 ora_vktm_demo
oracle    2024     1  0 15:33 ?        00:00:19 ora_gen0_demo
oracle    2026     1  0 15:33 ?        00:00:21 ora_diag_demo
oracle    2028     1  0 15:33 ?        00:00:19 ora_dbrm_demo
oracle    2030     1  0 15:33 ?        00:01:14 ora_dia0_demo
oracle    2032     1  0 15:33 ?        00:00:18 ora_mman_demo
oracle    2034     1  0 15:33 ?        00:00:22 ora_dbw0_demo
oracle    2036     1  0 15:33 ?        00:00:27 ora_lgwr_demo
oracle    2038     1  0 15:33 ?        00:00:47 ora_ckpt_demo
oracle    2040     1  0 15:33 ?        00:00:12 ora_smon_demo
oracle    2042     1  0 15:33 ?        00:00:09 ora_reco_demo
oracle    2044     1  0 15:33 ?        00:00:45 ora_mmon_demo
oracle    2046     1  0 15:33 ?        00:01:20 ora_mmnl_demo
oracle    2054     1  0 15:34 ?        00:00:10 ora_qmnc_demo
oracle    2070     1  0 15:34 ?        00:00:09 ora_q001_demo
oracle    2099     1  0 15:39 ?        00:00:18 ora_smco_demo
oracle    2214     1  0 16:01 ?        00:00:10 ora_q002_demo
oracle    2811     1  0 19:39 ?        00:00:00 ora_w000_demo
oracle    2826  2760  0 19:43 pts/3    00:00:00 grep ora_

… just ask the database:

select NAME,DESCRIPTION from v$bgprocess order by 1;

----- ----------------------------------------------------------------
ABMR  Auto BMR Background Process
ACMS  Atomic Controlfile to Memory Server
ARB0  ASM Rebalance 0
ARB1  ASM Rebalance 1
ARB2  ASM Rebalance 2
ARB3  ASM Rebalance 3
ARB4  ASM Rebalance 4
ARB5  ASM Rebalance 5
ARB6  ASM Rebalance 6
ARB7  ASM Rebalance 7
ARB8  ASM Rebalance 8
ARB9  ASM Rebalance 9
ARBA  ASM Rebalance 10
ARC0  Archival Process 0
ARC1  Archival Process 1
ARC2  Archival Process 2
ARC3  Archival Process 3
ARC4  Archival Process 4
ARC5  Archival Process 5
ARC6  Archival Process 6
ARC7  Archival Process 7
ARC8  Archival Process 8
ARC9  Archival Process 9
ARCa  Archival Process 10
ARCb  Archival Process 11
ARCc  Archival Process 12
ARCd  Archival Process 13
ARCe  Archival Process 14
ARCf  Archival Process 15
ARCg  Archival Process 16
ARCh  Archival Process 17
ARCi  Archival Process 18
ARCj  Archival Process 19
ARCk  Archival Process 20
ARCl  Archival Process 21
ARCm  Archival Process 22
ARCn  Archival Process 23
ARCo  Archival Process 24
ARCp  Archival Process 25
ARCq  Archival Process 26
ARCr  Archival Process 27
ARCs  Archival Process 28
ARCt  Archival Process 29
ASMB  ASM Background
CJQ0  Job Queue Coordinator
CKPT  checkpoint
CTWR  Change Tracking Writer
DBRM  DataBase Resource Manager
DBW0  db writer process 0
DBW1  db writer process 1
DBW2  db writer process 2
DBW3  db writer process 3
DBW4  db writer process 4
DBW5  db writer process 5
DBW6  db writer process 6
DBW7  db writer process 7
DBW8  db writer process 8
DBW9  db writer process 9
DBWa  db writer process 10 (a)
DBWb  db writer process 11 (b)
DBWc  db writer process 12 (c)
DBWd  db writer process 13 (d)
DBWe  db writer process 14 (e)
DBWf  db writer process 15 (f)
DBWg  db writer process 16 (g)
DBWh  db writer process 17 (h)
DBWi  db writer process 18 (i)
DBWj  db writer process 19 (j)
DBWk  db writer process 20 (k)
DBWl  db writer process 21 (l)
DBWm  db writer process 22 (m)
DBWn  db writer process 23 (n)
DBWo  db writer process 24 (o)
DBWp  db writer process 25 (p)
DBWq  db writer process 26 (q)
DBWr  db writer process 27 (r)
DBWs  db writer process 28 (s)
DBWt  db writer process 29 (t)
DBWu  db writer process 30 (u)
DBWv  db writer process 31 (v)
DBWw  db writer process 32 (w)
DBWx  db writer process 33 (x)
DBWy  db writer process 34 (y)
DBWz  db writer process 35 (z)
DIA0  diagnosibility process 0
DIA1  diagnosibility process 1
DIA2  diagnosibility process 2
DIA3  diagnosibility process 3
DIA4  diagnosibility process 4
DIA5  diagnosibility process 5
DIA6  diagnosibility process 6
DIA7  diagnosibility process 7
DIA8  diagnosibility process 8
DIA9  diagnosibility process 9
DIAG  diagnosibility process
DMON  DG Broker Monitor Process
DSKM  slave DiSKMon process
EMNC  EMON Coordinator
FBDA  Flashback Data Archiver Process
FMON  File Mapping Monitor Process
FSFP  Data Guard Broker FSFO Pinger
GEN0  generic0
GMON  diskgroup monitor
GTX0  Global Txn process 0
GTX1  Global Txn process 1
GTX2  Global Txn process 2
GTX3  Global Txn process 3
GTX4  Global Txn process 4
GTX5  Global Txn process 5
GTX6  Global Txn process 6
GTX7  Global Txn process 7
GTX8  Global Txn process 8
GTX9  Global Txn process 9
GTXa  Global Txn process 10
GTXb  Global Txn process 11
GTXc  Global Txn process 12
GTXd  Global Txn process 13
GTXe  Global Txn process 14
GTXf  Global Txn process 15
GTXg  Global Txn process 16
GTXh  Global Txn process 17
GTXi  Global Txn process 18
GTXj  Global Txn process 19
INSV  Data Guard Broker INstance SlaVe Process
LCK0  Lock Process 0
LGWR  Redo etc.
LMD0  global enqueue service daemon 0
LMHB  lm heartbeat monitor
LMON  global enqueue service monitor
LMS0  global cache service process 0
LMS1  global cache service process 1
LMS2  global cache service process 2
LMS3  global cache service process 3
LMS4  global cache service process 4
LMS5  global cache service process 5
LMS6  global cache service process 6
LMS7  global cache service process 7
LMS8  global cache service process 8
LMS9  global cache service process 9
LMSa  global cache service process 10
LMSb  global cache service process 11
LMSc  global cache service process 12
LMSd  global cache service process 13
LMSe  global cache service process 14
LMSf  global cache service process 15
LMSg  global cache service process 16
LMSh  global cache service process 17
LMSi  global cache service process 18
LMSj  global cache service process 19
LMSk  global cache service process 20
LMSl  global cache service process 21
LMSm  global cache service process 22
LMSn  global cache service process 23
LMSo  global cache service process 24
LMSp  global cache service process 25
LMSq  global cache service process 26
LMSr  global cache service process 27
LMSs  global cache service process 28
LMSt  global cache service process 29
LMSu  global cache service process 30
LMSv  global cache service process 31
LMSw  global cache service process 32
LMSx  global cache service process 33
LMSy  global cache service process 34
LSP0  Logical Standby
LSP1  Dictionary build process for Logical Standby
LSP2  Set Guard Standby Information for Logical Standby
MARK  mark AU for resync koordinator
MMAN  Memory Manager
MMNL  Manageability Monitor Process 2
MMON  Manageability Monitor Process
MRP0  Managed Standby Recovery
NSA1  Redo transport NSA1
NSA2  Redo transport NSA2
NSA3  Redo transport NSA3
NSA4  Redo transport NSA4
NSA5  Redo transport NSA5
NSA6  Redo transport NSA6
NSA7  Redo transport NSA7
NSA8  Redo transport NSA8
NSA9  Redo transport NSA9
NSAA  Redo transport NSAA
NSAB  Redo transport NSAB
NSAC  Redo transport NSAC
NSAD  Redo transport NSAD
NSAE  Redo transport NSAE
NSAF  Redo transport NSAF
NSAG  Redo transport NSAG
NSAH  Redo transport NSAH
NSAI  Redo transport NSAI
NSAJ  Redo transport NSAJ
NSAK  Redo transport NSAK
NSAL  Redo transport NSAL
NSAM  Redo transport NSAM
NSAN  Redo transport NSAN
NSAO  Redo transport NSAO
NSAP  Redo transport NSAP
NSAQ  Redo transport NSAQ
NSAR  Redo transport NSAR
NSAS  Redo transport NSAS
NSAT  Redo transport NSAT
NSAU  Redo transport NSAU
NSAV  Redo transport NSAV
NSS1  Redo transport NSS1
NSS2  Redo transport NSS2
NSS3  Redo transport NSS3
NSS4  Redo transport NSS4
NSS5  Redo transport NSS5
NSS6  Redo transport NSS6
NSS7  Redo transport NSS7
NSS8  Redo transport NSS8
NSS9  Redo transport NSS9
NSSA  Redo transport NSSA
NSSB  Redo transport NSSB
NSSC  Redo transport NSSC
NSSD  Redo transport NSSD
NSSE  Redo transport NSSE
NSSF  Redo transport NSSF
NSSG  Redo transport NSSG
NSSH  Redo transport NSSH
NSSI  Redo transport NSSI
NSSJ  Redo transport NSSJ
NSSK  Redo transport NSSK
NSSL  Redo transport NSSL
NSSM  Redo transport NSSM
NSSN  Redo transport NSSN
NSSO  Redo transport NSSO
NSSP  Redo transport NSSP
NSSQ  Redo transport NSSQ
NSSR  Redo transport NSSR
NSSS  Redo transport NSSS
NSST  Redo transport NSST
NSSU  Redo transport NSSU
NSSV  Redo transport NSSV
NSV0  Data Guard Broker NetSlave Process 0
NSV1  Data Guard Broker NetSlave Process 1
NSV2  Data Guard Broker NetSlave Process 2
NSV3  Data Guard Broker NetSlave Process 3
NSV4  Data Guard Broker NetSlave Process 4
NSV5  Data Guard Broker NetSlave Process 5
NSV6  Data Guard Broker NetSlave Process 6
NSV7  Data Guard Broker NetSlave Process 7
NSV8  Data Guard Broker NetSlave Process 8
NSV9  Data Guard Broker NetSlave Process 9
NSVA  Data Guard Broker NetSlave Process A
NSVB  Data Guard Broker NetSlave Process B
NSVC  Data Guard Broker NetSlave Process C
NSVD  Data Guard Broker NetSlave Process D
NSVE  Data Guard Broker NetSlave Process E
NSVF  Data Guard Broker NetSlave Process F
NSVG  Data Guard Broker NetSlave Process G
NSVH  Data Guard Broker NetSlave Process H
NSVI  Data Guard Broker NetSlave Process I
NSVJ  Data Guard Broker NetSlave Process J
NSVK  Data Guard Broker NetSlave Process K
NSVL  Data Guard Broker NetSlave Process L
NSVM  Data Guard Broker NetSlave Process M
NSVN  Data Guard Broker NetSlave Process N
NSVO  Data Guard Broker NetSlave Process O
NSVP  Data Guard Broker NetSlave Process P
NSVQ  Data Guard Broker NetSlave Process Q
NSVR  Data Guard Broker NetSlave Process R
NSVS  Data Guard Broker NetSlave Process S
NSVT  Data Guard Broker NetSlave Process T
NSVU  Data Guard Broker NetSlave Process U
PING  interconnect latency measurement
PMON  process cleanup
PSP0  process spawner 0
QMNC  AQ Coordinator
RBAL  ASM Rebalance master
RCBG  Result Cache: Background
RECO  distributed recovery
RMS0  rac management server
RSM0  Data Guard Broker Resource Guard Process 0
RSMN  Remote Slave Monitor
RVWR  Recovery Writer
SMCO  Space Manager Process
SMON  System Monitor Process
VBG0  Volume BG 0
VBG1  Volume BG 1
VBG2  Volume BG 2
VBG3  Volume BG 3
VBG4  Volume BG 4
VBG5  Volume BG 5
VBG6  Volume BG 6
VBG7  Volume BG 7
VBG8  Volume BG 8
VBG9  Volume BG 9
VDBG  Volume Driver BG
VKRM  Virtual sKeduler for Resource Manager
VKTM  Virtual Keeper of TiMe process
VMB0  Volume Membership 0
XDMG  cell automation manager
XDWK  cell automation worker actions

295 rows selected.

linux ( as well as most of the unixes ) provides the ability to integrate many different file systems at the same time. to name a few of them:

  • ext2, ext3, ext4
  • ocfs, ocfs2
  • reiserfs
  • vxfs
  • brtfs
  • dos, ntfs

although each of them provides different features and was developed with different purposes in mind the tools to work with them stay the same:

  • cp
  • mv
  • cd

the layer which makes this possible is called the virtual filesystem ( vfs ). this layer provides a common interface for the filesystems which are plugged into the operating system. I already introduced one special kind of filesystem, the the proc filesystem. the proc filesystem does not handle any files on disk or on the network, but neitherless it is a filesystem. in addition to the above mentioned filesystems, which all are disk based, filesystem may also handle files on the network, such as nfs or cifs.

no matter what kind of filesystem you are working with: when interacting with the filesystem by using the commands of choice you are routed through the virtual filesystem:

the virtual file system

to make this possible there needs to be a standard all file system implementations must comply with, and this standard is called the common file model. the key components this model consist of are:

  • the superblock which stores information about a mounted filesystem ( … that is stored in memory as a doube linked list )
  • inodes which store information about a specific file ( … that are stored in memory as a doube linked list)
  • the file object which stores information of the underlying files
  • dentries, which represent the links to build the directory structure ( … that are stored in memory as a doube linked list)

to speed up operations on the file systems some of the information which is normally stored on disk are cached. if you recall the post about slabs, you can find an entry like the following in the /proc/slabinfo file if you have a mounted ext4 filesystem on your system:

cat /proc/slabinfo | grep ext4 | grep cache
ext4_inode_cache   34397  34408    920   17    4 : tunables    0    0    0 : slabdata   2024   2024      0

so what needs the kernel to do if, for example, a request for listing the contents of a directoy comes in and the directory resides on an ext4 filesystem? because the filesystem is mounted the kernel knows that the filesystem for the specific request is of type ext4. the ls command will then be translated ( pointed ) to the specific ls implementation of the ext4 filesystem. this operation is the same for all commands interacting with filesystems. there is a pointer for each operation that links to the specific implementation of the command in question:

directory listing

as the superblock is stored in memory and therefore may become dirty, that is not synchronized with the superblock on disk, there is the same issue that oracle must handle with its buffer pools: periodically check the dirty flag and write down the changes to disk. the same is true for inodes ( while in memory ), which contain all the information that make up a file. closing a loop to oracle again: to speed up searching the ionodes linux maintains a hash table for fast access ( remember how oracle uses hashes to identify sql statements in the shared_pool ).

when there are files, there are processes which want to work with files. once a file is opened a new file object will be created. as these are frequent operations file objects are allocated through a slab cache.

the file objects itself are visible to the user through the /proc filesystem per process:

ls -la /proc/*/fd/
total 0
dr-x------ 2 root root  0 2012-05-18 14:03 .
dr-xr-xr-x 8 root root  0 2012-05-18 06:40 ..
lrwx------ 1 root root 64 2012-05-18 14:03 0 -> /dev/null
lrwx------ 1 root root 64 2012-05-18 14:03 1 -> /dev/null
lr-x------ 1 root root 64 2012-05-18 14:03 10 -> anon_inode:inotify
lrwx------ 1 root root 64 2012-05-18 14:03 2 -> /dev/null
lrwx------ 1 root root 64 2012-05-18 14:03 3 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 2012-05-18 14:03 4 -> /dev/null
lrwx------ 1 root root 64 2012-05-18 14:03 5 -> anon_inode:[signalfd]
lrwx------ 1 root root 64 2012-05-18 14:03 6 -> socket:[7507]
lrwx------ 1 root root 64 2012-05-18 14:03 7 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 2012-05-18 14:03 8 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 2012-05-18 14:03 9 -> socket:[11878]

usually numbers 0 – 3 refer to the standard input, standard output and standard error of the corresponding process.

last but not least there are the dentries. as with the file objects, dentries are allocated from a slab cache, the dentry cache in this case:

cat /proc/slabinfo | grep dentry
dentry             60121  61299    192   21    1 : tunables    0    0    0 : slabdata   2919   2919      0

directories are files, too, but special in that kind that dictories may contain other files or directories. once a directory is read into memory it is transformed into a dentry object. as this operation is expensive there is the dentry cache mentioned above. thus the operations for building the dentry objects can be minimized.
another link to oracle wording: the unused dentry double linked list uses a least recently used ( lru ) algorithm to track the usage of the entries. when the kernel needs to shrink the cache the objects at the tail of the list will be removed. as with the ionodes there is hash table for the dentries and a lock protecting the lists ( dcache_spin_lock in this case ).

this should give you enough hints to go further if you are interesed …

until now we had an introduction to processes, how they are managed, what signals are and what they are used for, how the linux kernel ( and oracle ) uses double linked list to quickly look up memory structures and how critical regions like shared memory can be protected. this post gives an introduction to timing and process scheduling.

as the cpu can execute only one process at a time but because maybe hundreds or thousands of processes want to do their work the kernel must provide a mechanism to decide which process to run next ( process switching ). this is the task of the scheduler. for being able to do what it does, the scheduler must be able to make decisions, and the decisions are based on time and priorities.

lots and lots of work behind the scenes is driven by time measurements. consider cronjobs, for example. without being able to measure time they would not work. in short the kernel must be able to keep the current time and to provide a mechanism to notify programs when a specific interval has elapsed.

on the one hand there is the real time clock ( accessible through the /dev/rtc interface ) which is a special chip that continues to tick even if the computer is powered off ( there is a small battery for this chip ). the real time clock is used by linux to derive the date and time.

on the other hand there are several other mechanisms which can be used for timing:

one of the time related activities the kernel must perform is to determine how long a process has been running. each process is given a time slot in which it may run, which is called a quanta. if the quantum expires and the process did not terminate a process switch may occur ( another process is selected for execution ). these processes are called expired. active processes are those which did not yet consume their quantum.
additionally each process has a priority assigned, which is used by the scheduler to decide how appropriate it is to let the process do its work on the cpu.

in general processes can be divided in three classes:

  • interactive: typical interactive processes are those which respond to keyboard and mouse inputs of an end user. as an user wants to see quick responses, for example when editing text, these processes must be woken up quickly
  • batch: batch processes do not interact with the user and often run in the background.
  • real-time: real-time processes have very strong scheduling requirements and should not be blocked by processes with lower priorities.

in general the scheduler will give more attention to interactive processes than to batch processes, although this must not always be true.

one way we can change the base priority of processes from the command line is by using the “nice” command:

nice -19 vi

if you check the process without the nice call:

ps -aux | grep vi
oracle 4185 0.5 0.0 5400 1504 pts/0 S+ 10:51 0:00 vi

… and compare it to when you call vi with a nice value:

ps -aux | grep vi
oracle 4194 1.6 0.0 5400 1496 pts/0 SN+ 10:52 0:00 vi

.. you will see that “S+” changes to “SN+” ( the “N” stands for “low-priority (nice to other users)”

processes in linux are preemptable, which means that higher priority processes may suspend lower priority processes when they enter the running state. another reason a process can be preempted is when its time quantum expires.

consider this example: a user is writing an email while copying music from a cd to her computer. the email client is considered an interactive program while the copy job is considered a batch program. each time the user presses a key on her keyboard an interrupt occurs and the scheduler selects the email program for execution. but because users tend to think when writing emails there is plenty of time ( regarding the cpu ) between the key presses to wake up the copy job and let it do its work.

the time a process is allowed to be on a cpu, the quantum, is derived from a so called “static priority” which can be in the range of 100 to 139 ( with 100 being the highest priority and 139 being the lowest ). the higher the priority the more time the process is granted ( which ranges from 800ms for the highest priority to 5ms for the lowest priority ). in addition to the static priority there is a “dynamic priority” for each process ( again ranging from 100 to 139 ). without going too much into detail again: the dynamic priority is the one the scheduler uses for its decisions. as the name suggest, this priority may change over time ( depending on the average sleep time of a process ). processes with longer sleep times usually get a bonus ( the priority will be increased ) while processes with lower sleep times will get a penalty ( the priority will be decreased ). the average sleep time is also used by the scheduler to decide if processes are interactive or batch.

recall the post about double linked lists. the most important data structure used by the scheduler is the runqueue, which in fact is another linked list. this list links together all the process descriptors of the processes which want to run ( there is one runqueue per cpu ). one process can be in one runqueue only, but processes may migrate to others runqueues if the load between the cpus becomes unbalanced.

what to keep in mind: as only one process can run on one cpu at a time the scheduler decides which process to run next and which processes to suspend in case higher priority processes enter the running state. in general interactive processes are favored over batch processes and real-time processes should not be blocked by lower priority processes.

the previous post about SIGSEGV and the ORA-07445 introduced signals and how they are used by the kernel to notify processes about events.

many times I see people using the “kill -9” to terminate processes. no questions, this works most of the time, but a lot of people are not aware what they are actually doing when firing this command. in my opinion, kill is a really bad name for this command, because what kill is doing is not necessarily kill processes, but send signals to processes ( of which 9, or SIGKILL is probably the most well known ). a much better name, for example, would be “sig” or “signal”.

the list of signals one can use in regards to kill can be printed with:

kill -l

here you can see, what 9 really means, it is SIGKILL. the ( perhaps ) dangerous about SIGKILL is, that the process will not be allowed to do any cleanup ( for example releasing resources ). as this signal can not be ignored or caught, the process will terminate immediately ( you can compare it to the “shutdown abort” command of the oracle database ).

the way one should terminate processes is to use the SIGTERM (15) signal, the default parameter for kill. this allows the process to do its cleanup work and to safely terminate ( although this is dependent on how the process or the applications handles the signal ).

once interesting thing you can do by sending a signal to a process with kill is to force, for example, the ssh daemon to re-read its configuration without closing the active ssh sessions. you can try this with:


be aware that this must not be true for other daemons, as this depends on how the program was implemented. the default behavior for SIGHUP ( hang up ) is abnormal termination. originally the SIGHUP comes from serial connections ( e.g. modems ) which indeed did a hang up when the user closed the connection.

closing the loop to the previous post about SIGSEGV and the ORA-07445 you can force an ORA-07445 by sending the SIGSEGV signal, for example, to the dbwriter process ( I hope there is no need to say: you should not try this on a production system ):

ps -ef | grep dbw
oracle 4560 1 0 13:18 ? 00:00:00 ora_dbw0_DB112
kill -11 4560

…which will result in the following errors reported in the alertlog:

Exception [type: SIGSEGV, unknown code] [ADDR:0xC41] [PC:0x32E7CD46BA, semtimedop()+10] [exception issued by pid: 3137, uid: 0] [flags: 0x0, count: 1]
Errors in file /oradata/DB112/admin/diag/rdbms/db112/DB112/trace/DB112_dbw0_4560.trc (incident=26003):
ORA-07445: exception encountered: core dump [semtimedop()+10] [SIGSEGV] [ADDR:0xC41] [PC:0x32E7CD46BA] [unknown code] []
Incident details in: /oradata/DB112/admin/diag/rdbms/db112/DB112/incident/incdir_26003/DB112_dbw0_4560_i26003.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Mon Apr 16 13:23:36 2012

for simulating a power failure try to send the SIGPWR to a sqlplus session. this will result in:

SQL> Power failure

conclusion: signals are one more important concept when it comes to understanding the operating system. by sending signals processes are notified about events and are given the chance to take the necessary actions ( if an appropriate handler is present ). when using the kill command you do not necessarily kill a process. what you are doing is sending a signal.

as usual, this is just an introduction and far from being complete ….

when working with the oracle database sooner or later you will face the ORA-07445 error reported in a trace file and the alertlog.

there is plenty of documentation about this error on oracle support and the web. in short: this is an unhandled exception in the oracle code ( in contrast the ORA-00600 errors are handled exceptions ). so why do I want to write about it ? because this is another example where you can map database behavior to the operating system. one common type of the ORA-07445 is this one: “type: SIGSEGV”.

what is this about and what does it stand for ?

software contains bugs. this is true for the linux kernel, this is true for the oracle database and this is true for probably all other software. to deal with unexpected behavior and to protect the system there must be some some sort of exception handler which processes/catches the exceptions once they occur and does the necessary steps to recover from the exceptions. the lowest level exceptions are raised by the CPU and must be handled by the operating system’s kernel. these exception are predefined and the kernel provides an exception handler for each of them.

some of them are:

  • division by zero
  • segment not present
  • stack segment fault
  • invalid opcode

you can, for example, check the intel documentation for a complete list of defined exceptions.

when an exception is raised the corresponding exception handler sends a signal to the process which caused the exception. and this is exactly what the SIGSEGV is: it is a signal. signals, for example ( the following list is not complete ), can be:

  • SIGSEGV: page faults, overflows
  • SIGFPE: divide error
  • SIGBUS: stack segments fault
  • SIGILL: invalid opcode

most of these exceptions can only occur when the kernel is in user mode, that is, when executing tasks from user programs ( oracle in this case ). there are two ways in which the processor can halt ( or interrupt ) process execution:

  • interrupts, which are ansynchron and typically triggered by I/O devices
  • exceptions, which are synchron and triggered by the processor when it detects predefined conditions while executing

when the processor halts process execution it switches to the handler routine ( each routine is defined in the interrupt description table, IDT ). once the handler routine has executed its tasks control is given back to the interrupted process.

unfortunately there is not much you can do about it. you can try to find a workaround with oracle support ( e.g. by setting some database parameters or applying a patch ) or check the generated dumps to get some hints on what exactly caused the exception.

a recent search on oracle support returned about 2500 results for the term SIGSEGV. you see, this is not an unusual signal …

as mentioned in the previous post about semaphores there are more things to consider when it comes to interprocess communication. as semaphores are used to protect critical regions, there must be some critical regions to protect and this is the shared memory oracle uses for its communication.

to give an example on how the shared memory addressing works we will take a look at what happens when the database starts up.
for this you’ll need two sessions to a test infrastructure ( one as the database owner, the other as root ).

session one ( oracle ):
connect to sqlplus as sysdba make sure you shutdown the database ( do not exit sqlplus once the database is down ):

sqlplus / as sysdba
shutdown immediate

session two ( root ): discover the PID for then sqlplus session above …

ps -ef | grep sqlp
oracle    3062  3036  0 09:49 pts/1    00:00:00 sqlplus

… check the shared memory segments and trace the sqlplus PID from above:

ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x7401003e 1310720    root      600        4          0                       
0x74010014 1998849    root      600        4          0                       
0x00000000 2359298    root      644        80         2                       
0x74010013 1966083    root      600        4          0                       
0x00000000 2392068    root      644        16384      2                       
0x00000000 2424837    root      644        280        2                       
0x00000000 2490374    grid      640        4096       0                       
0x00000000 2523143    grid      640        4096       0                       
0x8e11371c 2555912    grid      640        4096       0
# start the trace
strace -o db_startup.log -fp 3062

it is important to specify the “-f” flag for the strace call. this will tell strace to follow the child processes spawned.

in session one startup the database…


… and stop the tracing in the root session once the database is up and re-check the shared memory segments.

ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x7401003e 1310720    root      600        4          0                       
0x74010014 1998849    root      600        4          0                       
0x00000000 2359298    root      644        80         2                       
0x74010013 1966083    root      600        4          0                       
0x00000000 2392068    root      644        16384      2                       
0x00000000 2424837    root      644        280        2                       
0x00000000 2490374    grid      640        4096       0                       
0x00000000 2523143    grid      640        4096       0                       
0x8e11371c 2555912    grid      640        4096       0                       
0x00000000 3538953    oracle    640        4096       0                       
0x00000000 3571722    oracle    640        4096       0                       
0x3393b3a4 3604491    oracle    640        4096       0

as you can see, three more segments appeared after the database started up.

you’ll probably noticed some trace output on the screen similar to this:

Process 3468 detached
Process 3470 attached (waiting for parent)
Process 3470 resumed (parent 3409 ready)
Process 3471 attached (waiting for parent)
Process 3471 resumed (parent 3470 ready)
Process 3469 detached
Process 3470 detached

this is because of the “-f” flag given to strace.
the complete trace output is now available in the db_startup.log trace file and we are ready to take a look at it.

the first thing that catches the eye are the various references to the “/proc” filesystem. in may trace file there are 1213 calls to it. you can check this with:

grep "/proc/" db_startup.log | wc -l

take a look at the previous post which introduces the “/proc” filesystem for more information. for the scope of this post just notice how much depends on it.

the actual startup of the database is triggered by the following line:

execve("/opt/oracle/product/base/", ["oracleDB112", "(DESCRIPTION=(LOCAL=YES)(ADDRESS"], [/* 22 vars */]) = 0

this is the call to the oracle binary ( execve executes the binary ) with 22 arguments omitted. from now on the oracle instance starts up.

the calls important to the shared memory stuff are the following:

  • brk: changes a data segment’s size
  • mmap, munmap: maps/unmaps files or devices into memory
  • mprotect: sets protection on a region of memory
  • shmget: allocates a shared memory segment
  • shmat, shmdt: performs attach/detach operations on shared memory
  • get_mempolicy: return NUMA memory policies for a process
  • semget: get a semaphore identifier
  • semctl: perform control operations on a semaphore
  • semop, semtimedop: perform sempahore operations

for each of the above commands you can check the man-pages for more information.
as the trace file is rather large and a lot of things are happening i will focus on the minimum ( this is not about re-engineering oracle :) ):

let’s check the keys returned by the ipcs command above:

egrep "3538953|3571722|3604491" db_startup.log
5365  shmget(IPC_PRIVATE, 4096, IPC_CREAT|IPC_EXCL|0640) = 3538953
5365  shmget(IPC_PRIVATE, 4096, IPC_CREAT|IPC_EXCL|0640) = 3571722
5365  shmget(0x3393b3a4, 4096, IPC_CREAT|IPC_EXCL|0640) = 3604491

as you can see the identifiers returned by the shmget call ( 3604491,3571722,3538953 ) correspond to the ones reported by ipcs. you wonder about the size of 4096 bytes ? this is because memory_target/memory_max_target is in use by the instance. if the database is configured using sga_target/sga_max_target you would see the actual size. let’s check this:

su - oracle
sqlplus / as sysdba
alter system reset memory_max_target scope=spfile;
alter system reset memory_target scope=spfile;
alter system set sga_max_size=256m scope=spfile;
alter system set sga_target=256m scope=spfile;
alter system set pga_aggregate_target=24m scope=spfile;
startup force;
# re-check the shared memory segments
ipcs -m
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 2359298    root      644        80         2                       
0x00000000 2392068    root      644        16384      2                       
0x00000000 2424837    root      644        280        2                       
0x00000000 2490374    grid      640        4096       0                       
0x00000000 2523143    grid      640        4096       0                       
0x8e11371c 2555912    grid      640        4096       0                       
0x00000000 3801097    oracle    640        8388608    25                      
0x00000000 3833866    oracle    640        260046848  25                      
0x3393b3a4 3866635    oracle    640        2097152    25

the “260046848” corresponds to the sga size of 256m and the nattch column shows that 25 processes are attached to it. you can double check the 25
attached processes if you want:

ps -ef | grep DB112 | grep -v LISTENER | grep -v grep | wc -l

let’s return to the memory_target/memory_max_target configuration. as oracle puts together all the memory junks ( pga and sga ) the management of memory changes to the virtual shared memory filesystem ( tmpfs ). unfortunately this is not visible with the ipcs command.
but you can map your memory_* sizes to the shm filesystem:

ls -la /dev/shm/ | grep -v "+ASM"
total 466100
drwxrwxrwt  2 root   root        2640 Apr 10 13:09 .
drwxr-xr-x 10 root   root        3400 Apr 10 09:44 ..
-rw-r-----  1 oracle asmadmin 4194304 Apr 10 13:27 ora_DB112_3932169_0
-rw-r-----  1 oracle asmadmin 4194304 Apr 10 13:09 ora_DB112_3932169_1
-rw-r-----  1 oracle asmadmin 4194304 Apr 10 13:09 ora_DB112_3964938_0
-rw-r-----  1 oracle asmadmin 4194304 Apr 10 13:20 ora_DB112_3964938_1
-rw-r-----  1 oracle asmadmin 4194304 Apr 10 13:09 ora_DB112_3964938_10
-rw-r-----  1 oracle asmadmin 4194304 Apr 10 13:20 ora_DB112_3964938_11
-rw-r-----  1 oracle asmadmin 4194304 Apr 10 13:10 ora_DB112_3964938_12

note that i have excluded the ASM stuff here. in my case each segment ( or granule ) is 4mb of size ( this depends on the avaible memory of the system ) and the sum of all the segments should get you near to your memory_* configuration.

as ipcs can not tell you much here there are other commands to use. if you want to know which process has a memory granule open:

fuser -v /dev/shm/ora_DB112_4358154_49
                     USER        PID ACCESS COMMAND
                     oracle     6626 ....m oracle
                     oracle     6628 ....m oracle
                     oracle     6630 ....m oracle
                     oracle     6634 ....m oracle
                     oracle     6636 ....m oracle
                     oracle     6638 ....m oracle
                     oracle     6640 ....m oracle
                     oracle     6642 ....m oracle
                     oracle     6644 ....m oracle
                     oracle     6646 ....m oracle
                     oracle     6648 ....m oracle
                     oracle     6650 ....m oracle
                     oracle     6652 ....m oracle
                     oracle     6654 ....m oracle
                     oracle     6656 ....m oracle
                     oracle     6658 ....m oracle
                     oracle     6662 ....m oracle
                     oracle     6669 ....m oracle
                     oracle     6744 ....m oracle
                     oracle     6767 ....m oracle
                     oracle     6769 ....m oracle
                     oracle     6791 ....m oracle
                     oracle     7034 ....m oracle

or the other way around, if you want to know which files are opened by a specific process:

ps -ef | grep pmon | grep -v "ASM"
oracle    6626     1  0 13:40 ?        00:00:05 ora_pmon_DB112
root      7075  5338  0 14:33 pts/0    00:00:00 grep pmon
# use the pmap command on the PID
pmap 6626
6626:   ora_pmon_DB112
0000000000400000 183436K r-x--  /opt/oracle/product/base/
000000000b922000   1884K rwx--  /opt/oracle/product/base/
000000000baf9000    304K rwx--    [ anon ]
0000000010c81000    660K rwx--    [ anon ]
0000000060000000      4K r-xs-  /dev/shm/ora_DB112_4325385_0
0000000060001000   4092K rwxs-  /dev/shm/ora_DB112_4325385_0
0000000060400000   4096K rwxs-  /dev/shm/ora_DB112_4325385_1
0000000060800000   4096K rwxs-  /dev/shm/ora_DB112_4358154_0
0000000060c00000   4096K rwxs-  /dev/shm/ora_DB112_4358154_1
0000000061000000   4096K rwxs-  /dev/shm/ora_DB112_4358154_2
0000000061400000   4096K rwxs-  /dev/shm/ora_DB112_4358154_3
0000000061800000   4096K rwxs-  /dev/shm/ora_DB112_4358154_4
0000000061c00000   4096K rwxs-  /dev/shm/ora_DB112_4358154_5
0000000062000000   4096K rwxs-  /dev/shm/ora_DB112_4358154_6

if you have troubles starting up your instance with this configuration ( ORA-00845 ) check the size of the virtual filesystem:

df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/hdc1              28G   14G   12G  54% /
tmpfs                 741M  456M  286M  62% /dev/shm

depending on your configuration ( memory_* or sga_* parameters ) the way that memory is managed changes ( from System V to POSIX, to be exact ).

lots and lots of information. not all of it is important to keep in mind. but what you should remember:
there are several processes and memory segments that make up the oracle instance. as several processes are attached to the same memory regions there must be a way to protect them from concurrent access ( think of semaphores ) … and oracle heavily depends on shared memory. if you scroll through the trace file you’ll notice that there are thousands of operations going on when an oracle instance starts up. imagine what is going on if the instance is under heavy workload and lots and lots of things need protection.

ps: for those interested:

there is plenty of more interesting stuff which you can find in the db_startup.log trace, for example:

writing the audit files:

grep -i adump db_startup.log  | grep -v ASM
3404  open("/oradata/DB112/admin/adump/DB112_ora_3404_2.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = 10
3404  write(10, "/oradata/DB112/admin/adump/DB112"..., 47) = 47
3444  open("/oradata/DB112/admin/adump/DB112_ora_3444_1.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 EEXIST (File exists)
3444  open("/oradata/DB112/admin/adump/DB112_ora_3444_2.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = 8
3444  write(8, "/oradata/DB112/admin/adump/DB112"..., 47) = 47
3481  open("/oradata/DB112/admin/adump/DB112_ora_3481_1.aud", O_RDWR|O_CREAT|O_EXCL, 0660 
3481  write(8, "/oradata/DB112/admin/adump/DB112"..., 47) = 47

writing the alert.log:

grep -i "alert_DB112.log" db_startup.log
3404  lstat("/oradata/DB112/admin/diag/rdbms/db112/DB112/trace/alert_DB112.log", {st_mode=S_IFREG|0640, st_size=110201, ...}) = 0
3404  open("/oradata/DB112/admin/diag/rdbms/db112/DB112/trace/alert_DB112.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 5
3404  lstat("/oradata/DB112/admin/diag/rdbms/db112/DB112/trace/alert_DB112.log", {st_mode=S_IFREG|0640, st_size=110260, ...}) = 0
3404  open("/oradata/DB112/admin/diag/rdbms/db112/DB112/trace/alert_DB112.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 11

reading the oracle message files:

grep msb db_startup.log
db_startup.log:5438  open("/opt/oracle/product/base/", O_RDONLY) = 18
db_startup.log:5438  open("/opt/oracle/product/base/", O_RDONLY) = 18
db_startup.log:5430  open("/opt/oracle/product/base/", O_RDONLY 
db_startup.log:5494  open("/opt/oracle/product/base/", O_RDONLY

getting sempahores:

grep semget db_startup.log 
5365  semget(IPC_PRIVATE, 1, IPC_CREAT|IPC_EXCL|0600) = 1081346
5365  semget(IPC_PRIVATE, 124, IPC_CREAT|IPC_EXCL|0666) = 1114114
5365  semget(IPC_PRIVATE, 124, IPC_CREAT|0660) = 1146882
5365  semget(0x710dfe10, 0, 0)          = -1 ENOENT (No such file or directory)
5365  semget(0x46db3f80, 0, 0)          = -1 ENOENT (No such file or directory)
5365  semget(0x9ae46084, 0, 0)          = -1 ENOENT (No such file or directory)
5365  semget(0xf6dcc368, 0, 0)          = -1 ENOENT (No such file or directory)
5365  semget(0x710dfe10, 124, IPC_CREAT|IPC_EXCL|0640) = 1179650

some exadata stuff:

3404  open("/etc/oracle/cell/network-config/cellinit.ora", O_RDONLY) = -1 ENOENT (No such file or directory)

and … and …

if you followed the oracle installation guide there are some kernel parameters to be configured for the oracle database. one of them specified by four values is about semaphores:

  • semmsl: the maximum number of semaphores per semaphore set
  • semmns: the maximum number of semaphore of the entire system
  • semopm: number of maximum operations per semop call
  • semmni: the maximum number of semaphore sets of the entire system

the question is: what are these semaphores about and what are they for?


a semaphore is a counter associated with a data structure which provides locking and synchronization of critical regions. there is one semaphore ( initialized to 1 ) for each data structure to be protected. the atomic methods “down” and “up” are used to decrease and increase the counter. if the kernel wants access to a protected structure it executes the “down” method and if the result is not negative ( the counter is equal or greater than zero ) access to the resource is granted. if the counter is negative the process which wishes to access the resource is blocked and added to the sempahore list ( a kind of queue ). as time goes by some process finishes its work and executes the “up” method which allows one process in the semaphore list to proceed.

in linux there are two kinds of semaphores:

  • kernel semaphores ( for kernel control paths )
  • system V IPC semaphores ( for user mode processes ), IPC stands for “interprocess communication”

the IPC semaphores are the ones relevant to the oracle database. semaphores are created by the function semget() which returns the semaphore identifier. there are two other functions for creating ipc resources, which are:

  • msgget(): which is for message queues
  • shmget(): which is for shared memory

there must be at least one semaphore for each oracle process ( the processes parameter of the database ). as each session to the database needs to be synchronized with other sessions ( and sessions are memory structures ) oracle must request resources from the operation system to be able to handle concurrency in the sga to which all sessions are connected to.

the semaphores queue of pending requests is implemented as a double linked list. you remember? the same concepts over and over again. actually semaphores are sometimes called mutexes ( and there are internal functions like init_MUTEX )… surprised ?

to display the current limits the following command can be used:

ipcs -ls
------ Semaphore Limits --------
max number of arrays = 1024
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

or you can directly query the /proc filesystem:

cat /proc/sys/kernel/sem
250 32000 32 1024

to check the currently allocated semaphores:

ipcs -s
------ Semaphore Arrays --------
key semid owner perms nsems
0x127f81f8 163842 oracle 640 124
0x3d2c0d44 1933315 oracle 640 129
0x3d2c0d45 1966084 oracle 640 129
0x3d2c0d46 1998853 oracle 640 129
0x3d2c0d47 2031622 oracle 640 129
0x3d2c0d48 2064391 oracle 640 129

if you want to see some semaphore operations in action do, for example, a strace on the smon process and wait one or two seconds:

ps -ef | grep dbw
oracle 2723 1 0 08:31 ? 00:00:03 ora_smon_dbs300
root 3153 3111 2 09:04 pts/1 00:00:00 grep smon
strace -p 2723
semtimedop(819203, {{17, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1333697562, 861932}, NULL) = 0
semtimedop(819203, {{17, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1333697565, 871767}, NULL) = 0
semtimedop(819203, {{17, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1333697568, 893455}, NULL) = 0
semtimedop(819203, {{17, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1333697571, 905050}, NULL) = 0
semtimedop(819203, {{17, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
gettimeofday({1333697574, 920094}, NULL) = 0

here you can clearly see calls to semtimedop which return with -1 ( EAGAIN, the call expired ).

if you followed the series on how to setup a test infrastructure or you have a test infrastructure available to play with here is a little demonstration:

be sure to save your current kernel semaphore settings:

cat /proc/sys/kernel/sem
250 32000 32 1024

minimize the settings to very low values and try to restart ( or start ) the oracle instance:

echo "2 10 1 2" > /proc/sys/kernel/sem

if you write the values to the /proc/sys/kernel/…-parameter files the values will be in effect immediately. be sure what you are doing if you’re touching these parameters.

startup or restart the oracle instance:

su - oracle
sqlplus / as sysdba
startup force
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpbitsper

what happened ? the database was not able to allocate the resources ( sempahores ) it needs, thus can not start up. the message “No space left on device” indicates some issue with disk space, but it’s actually the memory in question here. fyi: the allocation of semaphores ( and shared memory ) does only occur during the “startup nomount” phase. this is the only time oracle requests this resources.

fix it, and try again:

su -
echo "250 32000 32 1024" > /proc/sys/kernel/sem
su - oracle
startup force

conclusion: semaphores are one critical part for interprocess communication and provide a way for locking and synchronization. if the database can not allocate the resources it needs from the operating system it will fail to start up.

the values recommended in the installation guides are minimum values. if your session count to the database grows you may need to adjust the semaphore parameters.

when it comes to the management of processes and data structures there is one essential concept: double linked lists

you can think of this concept as n lists ( memory structures ) linked together each containing pointers to the next and to the previous list:

double linked lists

from the perspective of the linux kernel this lists help, for example, in tracking all the processes in the system. if it comes to the oracle database there is the same concept. all the caches and pools are based on double linked lists.

one place where you can see that oracle uses the same concept is the buffer headers. although you can not see it in v$bh oracle exposes the information in the underlying x$bh:

SQL> select NXT_REPL,PRV_REPL from x$bh where rownum < 5;
-------- --------
247E550C 247E535C
23BE334C 23BE319C
23BF38DC 23BF372C
22FE694C 22FE679C

these are some pointers to the next and previous lists mentioned above. you can even check the relations:

SQL> select NXT_REPL,PRV_REPL from x$bh where NXT_REPL = '23BF38DC' or PRV_REPL = '23BF38DC';
-------- --------
23BF38DC 23BF372C
23BF3A8C 23BF38DC

for managing these lists there must be some atomic ( notice the vocabulary, it’s the same as the A in ACID ) operations implemented:

  • initialize the list
  • inserting and deleting elements
  • walking through the list for finding an element
  • checking for empty elements in the list

the list of processes in linux is called the process list. this list links together all the processes in the system, more exactly: it links together all the process descriptors. if the kernel wants to know which processes are ready to run, it scans the list for all processes in state TASK_RUNNING. there are several others states a process can be in:

  • TASK_RUNNING: the process waits to be executed or currently is executing
  • TASK_INTERRUPTIBLE: the process sleeps until some event occurs
  • TASK_UNINTERRUPTIBLE: the process sleeps and will not wake up on a signal
  • TASK_STOPPED: the process is stopped
  • TASK_TRACED: the process is being traced

when one process creates one or more other processes there are one or more parent/child relationships. this relationships are present in the process descriptors. the init process ( which is pid 1 ) is the master ( or the anchor ) of all other processes. all this relations are managed by linked lists.

for the kernel to quickly find an entry in one of the lists another concept is introduced: hashing. hashing data is an efficient way to locate an element in a list. for example the number 85 might hash to the 10th entry of a list ( so after hashing the kernel can jump directly to this entry instead of scanning the whole list for the value in question ). this is another link to the oracle database as oracle is excessively using hashing, too ( for example to quickly locate sql-statements in the shared pool oracle hashes the text of the statement ).

you probably heard of locks in the oracle world. next link between the os and the database. when elements of lists are modified there is a need to protect them from concurrent access. imagine what happens if two ( or more ) processes try to modify the same data at the same time. this is where locks come into play: locks provide a mechanism for synchronization. in the linux kernels wait queues, for example, there are exclusive processes and nonexclusive processes. the latter are always woken up by the kernel if some specific events occur while the exclusive processes are woken up selectively ( for example if they want to access a resource only one processes can be granted to at a time ). again, you see same vocabulary here than in the oracle database world: there are exclusive locks, shared locks, etc.

by the way: keeping the duration of the locks as short as possible without risking data inconsistency is one key to performance. because if there are locks there is very good chance that others have to wait until the locks disappear ( and waiting is wasted time in terms of performance ). this is why mutexes appeared in the oracle database: they provide a faster way of protecting data than the traditional latches ( which are a kind of lock in the oracle database ).

conclusion: if you understand how the operating system handles resources it is not a big deal to understand some basic workings of the database. much is about double linked lists and protecting data. even the vocabulary is the same very often. you see the same terms over and over again ( queues, locks, spins, lists, waits …. ).

if you want to go into more detail on how oracle handles lists, check out james morle’s book scalingoracle8i which is available for download now. don’t care about the 8i, the basics are still the same.

happy listing …

this post will continue the introduction post to linux processes and shows how the listener handles connection requests to the database.

let’s check the listeners pid:

ps -ef | grep tns
grid 2646 1 0 10:47 ? 00:00:03 /opt/oracle/product/base/ LISTENER_DB112 -inherit

as the listener needs to handle connections there must be some sorts of open files for the listener process:

[root@oracleplayground ~]# ls -la /proc/2646/fd/
total 0
dr-x------ 2 grid oinstall 0 Apr 4 10:47 .
dr-xr-xr-x 6 grid oinstall 0 Apr 4 10:47 ..
lrwx------ 1 grid oinstall 64 Apr 4 15:29 0 -> /dev/null
lrwx------ 1 grid oinstall 64 Apr 4 15:29 1 -> /dev/null
lrwx------ 1 grid oinstall 64 Apr 4 15:29 10 -> socket:[7875]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 11 -> socket:[7877]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 12 -> socket:[7949]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 13 -> socket:[7950]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 15 -> socket:[12719]
lrwx------ 1 grid oinstall 64 Apr 4 15:29 2 -> /dev/null
lr-x------ 1 grid oinstall 64 Apr 4 15:29 3 -> /opt/oracle/product/base/
lr-x------ 1 grid oinstall 64 Apr 4 15:29 4 -> /proc/2646/fd
lr-x------ 1 grid oinstall 64 Apr 4 15:29 5 -> /opt/oracle/product/base/
lr-x------ 1 grid oinstall 64 Apr 4 15:29 6 -> pipe:[7822]
lr-x------ 1 grid oinstall 64 Apr 4 15:29 7 -> /opt/oracle/product/base/
lrwx------ 1 grid oinstall 64 Apr 4 15:29 8 -> socket:[7873]
l-wx------ 1 grid oinstall 64 Apr 4 15:29 9 -> pipe:[7823]
[root@oracleplayground ~]# 

this tells us that the listener has 15 open file descriptors of which 8,10-15 are sockets and 9 is a pipe.

let’s request a connection to the database through the listener and trace the listener process in parallel:

Session 1 ( as root ):

strace -p 2646

Session 2 ( as oracle ):

sqlplus a/a@DB112

the strace output ( i have number the lines ) should look similar to this:

1: poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=13, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=-1}, {fd=-1}], 7, -1) = 1 ([{fd=13, revents=POLLIN|POLLRDNORM}])
2: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191568
3: getsockname(13, {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr("")}, [11087335753955409936]) = 0
4: getpeername(13, 0x7fffc6b5d1a8, [11087335753955409936]) = -1 ENOTCONN (Transport endpoint is not connected)
5: accept(13, {sa_family=AF_INET, sin_port=htons(23139), sin_addr=inet_addr("")}, [68719476752]) = 14
6: getsockname(14, {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr("")}, [68719476752]) = 0
7: fcntl(14, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
8: getsockopt(14, SOL_SOCKET, SO_SNDBUF, [-4128159149999471140], [4]) = 0
9: getsockopt(14, SOL_SOCKET, SO_RCVBUF, [-4128159149999434336], [4]) = 0
10: setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0
11: fcntl(14, F_SETFD, FD_CLOEXEC) = 0
12: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191568
14: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191568
15: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191568
16: poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=13, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=14, events=POLLIN|POLLRDNORM}, {fd=-1}], 7, 60000) = 1 ([{fd=14, revents=POLLIN|POLLRDNORM}])
17: read(14, "\335\1\1:\1,\fA \377\377\177\10\1\243:\10"..., 8208) = 221
18: fcntl(14, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
19: fcntl(14, F_SETFL, O_RDWR) = 0
20: times({tms_utime=76, tms_stime=325, tms_cutime=0, tms_cstime=1}) = 431191569
21: fcntl(14, F_SETFD, 0) = 0
22: pipe([16, 17]) = 0
23: pipe([18, 19]) = 0
24: clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2ac09be5bb80) = 5894
25: wait4(5894, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 5894
26: close(16) = 0
27: close(19) = 0
28: fcntl(18, F_SETFD, FD_CLOEXEC) = 0
29: fcntl(17, F_SETFD, FD_CLOEXEC) = 0
30: fcntl(14, F_SETFD, FD_CLOEXEC) = 0
31: poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=13, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=18, events=POLLIN|POLLRDNORM}, {fd=17, events=0}], 7, -1) = 1 ([{fd=18, revents=POLLIN|POLLRDNORM}])
32: read(18, "NTP0 5895\n", 64) = 10
33: write(17, ";", 4) = 4
34: write(17, "(ADDRESS=(PROTOCOL=tcp)(DEV=14)("..., 59) = 59
35: write(17, "\1\4", 8) = 8
36: read(18, "", 4) = 4
37: read(18, "*1", 4) = 4
38: write(14, "\10\v", 8) = 8
39: close(17) = 0
40: close(18) = 0
41: close(14) = 0
42: lseek(7, 19968, SEEK_SET) = 19968
43: read(7, "\f005\4P006\4j007\4\206008\4\240009\4\335"..., 512) = 512
44: poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLIN|POLLRDNORM}, {fd=13, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=-1}, {fd=-1}], 7, -1) = 1 ([{fd=15, revents=POLLIN|POLLRDNORM}])
45: Process 2646 detached

a lot of cryptic output, isn’t it? let’s take a closer look on what’s happening:

on line one you can see a call to “poll”. poll waits for some events on the file descriptors. as connections are files, you may say it waits for some sort of connections.

on line two you can see a call to times, which returns several process times.

on line three there is a call to getsockname which returns the socket name for this address: {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr(“”)}, which is
the listener on port 1521 ( sa_family: this is the address family, AF_INET is the one used for IP, sin_port: is the port, sin_addr: is the address, in this case localhost ).

the call on line four ( getpeername ) does what it is called: get the name of the connected peer. it returns with ENOTCONN ( the socket is not connected ). next, one line five, a connection to the socket is accepted. notice that a new file descriptor is created ( 14 ) and passed to the getsockname call on line six. getsockname returns with code 0, which is success. now there is a connected endpoint which is our request to the listener.

the call to fcntl on line 7 modifies the file descriptor: set the file status flags ( F_SETFL ) to read only ( O_RDONLY ) and block system calls ( O_NONBLOCK ).

on line 8 and 9 the values of the sockets receive and sent buffers are read and on line 10 the TCP option TCP_NODELAY ( which disables Nagle’s algorithm ) is set.

next, on line 11, the socket is modified to close when a call to an exec function is performed. from now on I will ignore the calls to times, as I have described it above.

on line 13 rt_sigaction changes the action taken by the process on receipt of a signal and line 17 reads from the file associated with the file
descriptor ( ignore line 14,15,16 ). line 18 again reads the file descriptor flags and line 19 sets the flag for read/write ( O_RDWR ) and line 21
resets the file descriptor flags to the defaults.

line 22 and 23 create a pair of file descriptors pointing to a pipe inode ( the first is for reading, the second for writing ). the clone call on line 24 does interesting stuff, it creates a new process with is 5894 and line 25 waits for the new process to change its state ( the child_stack=0 indicates that a process is created, not a thread ), in other words it waits for the process to exit.

if we now do a check if the process is there, you will notice that no process with this PID exists ( this is probably because it is the PID of the clone itself ):

ps -ef | grep 5894
root 5984 5611 0 15:58 pts/3 00:00:00 grep 5894

but what exists, is PID+1 ( which you can see on line 32 ):

ps -ef | grep 5895
oracle 5895 1 0 15:44 ? 00:00:00 oracleDB112 (LOCAL=NO)
root 5997 5611 0 16:00 pts/3 00:00:00 grep 5895

… which is our connection to the database. if you check this process you will see that the socket 14 is now available in the newly created process:

ls -la /proc/5895/fd
total 0
dr-x------ 2 root root 0 Apr 4 15:44 .
dr-xr-xr-x 6 oracle asmadmin 0 Apr 4 15:44 ..
lr-x------ 1 root root 64 Apr 4 16:00 0 -> /dev/null
l-wx------ 1 root root 64 Apr 4 16:00 1 -> /dev/null
lrwx------ 1 root root 64 Apr 4 16:00 14 -> socket:[102504]
l-wx------ 1 root root 64 Apr 4 16:00 2 -> /dev/null
lr-x------ 1 root root 64 Apr 4 16:00 3 -> /dev/null
lr-x------ 1 root root 64 Apr 4 16:00 4 -> /dev/null
lr-x------ 1 root root 64 Apr 4 16:00 5 -> /opt/oracle/product/base/
lr-x------ 1 root root 64 Apr 4 16:00 6 -> /proc/5895/fd
lr-x------ 1 root root 64 Apr 4 16:00 7 -> /dev/zero

the remaining lines will close some files ( including the listeners file descriptor 14 ) and write some data. i will ignore the rest of the output as it should be clear now, how the listener hands off the connections to the database: it listens to incoming requests on the defined port and creates a new process which is the database connection. that’s all the listener does. once the connection is established there is no more work to do for the listener and it looses control of the newly created process.

just one more thing of interest: the lseek and read calls to file descriptor 7 ( lines 42 and 43 ) are positioning and reading the file containing the tns messages. you can check this with:

strings /opt/oracle/product/base/

this are the messages the listener returns.

happy listening …

as mentioned in earlier posts as a dba you need to know how the operating system works. this post is an introduction to processes on linux.

the definition of process is: a process is an instance of a program in execution.
to manage a process the linux kernel must know a lot of things about the process, e.g. which files the process is allowed to handle, if it is running on CPU or blocked, the address space of the process etc. all this information is present in the so called process descriptor. you can think of the process descriptor as a strcuture containing all the information the kernel needs to know about the process ( internally the structure is called: task_structure ). on of the information stored in the process descriptor is the process id which is used to identify the process.

let’s take a look at the processes that make up the oracle database:

ps -ef | grep $ORACLE_SID | egrep -v "DESCRIPTION|grep|tnslsnr" 
oracle    2944     1  0 08:30 ?        00:00:03 ora_pmon_DB112
oracle    2946     1  0 08:30 ?        00:00:06 ora_psp0_DB112
oracle    2948     1  2 08:30 ?        00:01:11 ora_vktm_DB112
oracle    2952     1  0 08:30 ?        00:00:02 ora_gen0_DB112
oracle    2954     1  0 08:30 ?        00:00:03 ora_diag_DB112
oracle    2956     1  0 08:30 ?        00:00:02 ora_dbrm_DB112
oracle    2958     1  0 08:30 ?        00:00:06 ora_dia0_DB112
oracle    2961     1  0 08:30 ?        00:00:02 ora_mman_DB112
oracle    2963     1  0 08:30 ?        00:00:03 ora_dbw0_DB112
oracle    2965     1  0 08:30 ?        00:00:03 ora_lgwr_DB112
oracle    2967     1  0 08:30 ?        00:00:06 ora_ckpt_DB112
oracle    2969     1  0 08:30 ?        00:00:01 ora_smon_DB112
oracle    2971     1  0 08:30 ?        00:00:00 ora_reco_DB112
oracle    2973     1  0 08:30 ?        00:00:02 ora_rbal_DB112
oracle    2975     1  0 08:30 ?        00:00:01 ora_asmb_DB112
oracle    2977     1  0 08:30 ?        00:00:05 ora_mmon_DB112
oracle    2979     1  0 08:30 ?        00:00:09 ora_mmnl_DB112
oracle    2987     1  0 08:30 ?        00:00:04 ora_mark_DB112
oracle    3013     1  0 08:30 ?        00:00:00 ora_qmnc_DB112
oracle    3054     1  0 08:30 ?        00:00:01 ora_q000_DB112
oracle    3056     1  0 08:30 ?        00:00:00 ora_q001_DB112
oracle    3182     1  0 08:35 ?        00:00:02 ora_smco_DB112
oracle    3359     1  0 09:05 ?        00:00:00 ora_w000_DB112

notice that the grep command excluded the listener process and all the current connections to the database.
if you want to check the local connections connections to the database from the os, you can do something like this:

ps -ef | grep $ORACLE_SID | grep "LOCAL=YES"
oracle    2933     1  0 10:48 ?        00:00:01 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle    2969     1  0 10:48 ?        00:00:00 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle    2979     1  0 10:48 ?        00:00:00 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle    3088  3087  0 10:50 ?        00:00:00 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle    3096     1  0 10:50 ?        00:00:00 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

checking the processes from inside the database would be as simple as this ( for the background processes ):

SQL> select pname from v$process where pname is not null;

with the above arguments ( -ef ) supplied to the ps command, the columns displayed are:

  • the os-user the process runs under
  • the process id
  • the parent process id
  • processor utilization
  • start time of the process
  • the terminal the process was started on ( if any )
  • the cumulative CPU time
  • the command

but where does the ps command get the information to display from ? in fact you can get all of the information displayed above without using the ps command. all you need to do is to check the pseudo filesystem /proc ( it is called a pseudo filesystem because it is a virtual filesystem that maps to the kernel structures ).

if you do a “ls” on the proc filesystem you’ll see a lot of directories and files. for this post we will concentrate on the numbered directories which map to process ids.

let’s take smon as an example, which is the oracle system monitor ( you will need to adjust the process-id for your environment ):

ls -la /proc/2969/
dr-xr-xr-x   6 oracle asmadmin 0 Apr  3 13:40 .
dr-xr-xr-x 154 root   root     0 Apr  3  2012 ..
dr-xr-xr-x   2 oracle asmadmin 0 Apr  3 14:07 attr
-r--------   1 root   root     0 Apr  3 14:07 auxv
-r--r--r--   1 root   root     0 Apr  3 13:45 cmdline
-rw-r--r--   1 root   root     0 Apr  3 14:07 coredump_filter
-r--r--r--   1 root   root     0 Apr  3 14:07 cpuset
lrwxrwxrwx   1 root   root     0 Apr  3 14:07 cwd -> /opt/oracle/product/base/
-r--------   1 root   root     0 Apr  3 14:07 environ
lrwxrwxrwx   1 root   root     0 Apr  3 14:07 exe -> /opt/oracle/product/base/
dr-x------   2 root   root     0 Apr  3 13:40 fd
dr-x------   2 root   root     0 Apr  3 14:07 fdinfo
-r--------   1 root   root     0 Apr  3 14:07 io
-r--r--r--   1 root   root     0 Apr  3 14:07 limits
-rw-r--r--   1 root   root     0 Apr  3 14:07 loginuid
-r--r--r--   1 root   root     0 Apr  3 13:40 maps
-rw-------   1 root   root     0 Apr  3 14:07 mem
-r--r--r--   1 root   root     0 Apr  3 14:07 mounts
-r--------   1 root   root     0 Apr  3 14:07 mountstats
-r--r--r--   1 root   root     0 Apr  3 14:07 numa_maps
-rw-r--r--   1 root   root     0 Apr  3 14:07 oom_adj
-r--r--r--   1 root   root     0 Apr  3 14:07 oom_score
lrwxrwxrwx   1 root   root     0 Apr  3 14:07 root -> /
-r--r--r--   1 root   root     0 Apr  3 14:07 schedstat
-r--r--r--   1 root   root     0 Apr  3 14:07 smaps
-r--r--r--   1 root   root     0 Apr  3 13:40 stat
-r--r--r--   1 root   root     0 Apr  3 14:07 statm
-r--r--r--   1 root   root     0 Apr  3 13:45 status
dr-xr-xr-x   3 oracle asmadmin 0 Apr  3 14:07 task
-r--r--r--   1 root   root     0 Apr  3 14:07 wchan

what do we see here? lots and lots of information of the smon process. for a detailed description of what all the files and directories are about, you can go to the man-pages:

man proc

for example, if we take a look at the statm file of the process:

cat /proc/2969/statm
126385 16013 14717 45859 0 994 0

… and check the man pages for the meaning of the numbers, things are getting clearer:

      Provides information about memory status in pages.  The columns are:
               size       total program size
               resident   resident set size
               share      shared pages
               text       text (code)
               lib        library
               data       data/stack
               dt         dirty pages (unused in Linux 2.6)

wanting to know the environment of the process? just take a look at the environ file:

cat /proc/2969/environ 
__CLSAGFW_TYPE_NAME=ora.listener.typeORA_CRS_HOME=/opt/oracle/product/crs/,37ORACLE_SPAWNED_PROCESS=1SKGP_SPAWN_DIAG_PRE_FORK_TS=1333453218SKGP_SPAWN_DIAG_POST_FORK_TS=1333453218SKGP_HIDDEN_ARGS=0SKGP_SPAWN_DIAG_PRE_EXEC_TS=1333453218[root@oracleplayground 2642]# 

… which files were opened by the process ?:

ls -la fd/
total 0
dr-x------ 2 root   root      0 Apr  3 13:40 .
dr-xr-xr-x 6 oracle asmadmin  0 Apr  3 13:40 ..
lr-x------ 1 root   root     64 Apr  3 16:23 0 -> /dev/null
l-wx------ 1 root   root     64 Apr  3 16:23 1 -> /dev/null
lr-x------ 1 root   root     64 Apr  3 16:23 10 -> /dev/null
lr-x------ 1 root   root     64 Apr  3 16:23 11 -> /dev/null
lr-x------ 1 root   root     64 Apr  3 16:23 12 -> /dev/null
lrwx------ 1 root   root     64 Apr  3 16:23 13 -> /opt/oracle/product/base/
lr-x------ 1 root   root     64 Apr  3 16:23 14 -> /dev/null
lr-x------ 1 root   root     64 Apr  3 16:23 15 -> /dev/null
lr-x------ 1 root   root     64 Apr  3 16:23 16 -> /dev/zero
lr-x------ 1 root   root     64 Apr  3 16:23 17 -> /dev/zero
lrwx------ 1 root   root     64 Apr  3 16:23 18 -> /opt/oracle/product/base/
lr-x------ 1 root   root     64 Apr  3 16:23 19 -> /opt/oracle/product/base/
l-wx------ 1 root   root     64 Apr  3 16:23 2 -> /dev/null
lr-x------ 1 root   root     64 Apr  3 16:23 20 -> /proc/2875/fd
lr-x------ 1 root   root     64 Apr  3 16:23 21 -> /opt/oracle/product/crs/
lr-x------ 1 root   root     64 Apr  3 16:23 22 -> /dev/zero
lrwx------ 1 root   root     64 Apr  3 16:23 23 -> /opt/oracle/product/base/
lrwx------ 1 root   root     64 Apr  3 16:23 24 -> /opt/oracle/product/base/
lr-x------ 1 root   root     64 Apr  3 16:23 25 -> /opt/oracle/product/base/
lrwx------ 1 root   root     64 Apr  3 16:23 256 -> /dev/sda1
lrwx------ 1 root   root     64 Apr  3 16:23 3 -> /opt/oracle/product/crs/
l-wx------ 1 root   root     64 Apr  3 16:23 4 -> /opt/oracle/product/crs/
lr-x------ 1 root   root     64 Apr  3 16:23 5 -> /dev/null
lrwx------ 1 root   root     64 Apr  3 16:23 6 -> socket:[7791]
lrwx------ 1 root   root     64 Apr  3 16:23 7 -> socket:[7792]
lrwx------ 1 root   root     64 Apr  3 16:23 8 -> socket:[7793]
lrwx------ 1 root   root     64 Apr  3 16:23 9 -> socket:[7794]

conclusion: it’s really worth to read the man pages and understand the /proc/[PID] structures. this can be a very good starting point if you have troubles with one of the processes running on your system.

and last but not least: maybe you don’t believe that the ps command is reading the /proc/[PID] structures to diplay it’s information. you can always trace the commands and check what’s happening behind:

strace -o strace.log ps -ef

this will write the strace output to a file named strace.log. grep for you smon process and check which files were read:

grep 2969 strace.log
stat("/proc/2969", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/2969/stat", O_RDONLY)       = 6
read(6, "2969 (oracle) S 1 2921 2921 0 -1"..., 1023) = 191
open("/proc/2969/status", O_RDONLY)     = 6
open("/proc/2969/cmdline", O_RDONLY)    = 6
write(1, "oracle    2969     1  0 10:48 ? "..., 63) = 63

here we go: a subset of the same files listed above:


happy processing …