as mentioned in the previous post about semaphores there are more things to consider when it comes to interprocess communication. as semaphores are used to protect critical regions, there must be some critical regions to protect and this is the shared memory oracle uses for its communication.
to give an example on how the shared memory addressing works we will take a look at what happens when the database starts up.
for this you’ll need two sessions to a test infrastructure ( one as the database owner, the other as root ).
session one ( oracle ):
connect to sqlplus as sysdba make sure you shutdown the database ( do not exit sqlplus once the database is down ):
sqlplus / as sysdba shutdown immediate
session two ( root ): discover the PID for then sqlplus session above …
ps -ef | grep sqlp oracle 3062 3036 0 09:49 pts/1 00:00:00 sqlplus
… check the shared memory segments and trace the sqlplus PID from above:
ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x7401003e 1310720 root 600 4 0 0x74010014 1998849 root 600 4 0 0x00000000 2359298 root 644 80 2 0x74010013 1966083 root 600 4 0 0x00000000 2392068 root 644 16384 2 0x00000000 2424837 root 644 280 2 0x00000000 2490374 grid 640 4096 0 0x00000000 2523143 grid 640 4096 0 0x8e11371c 2555912 grid 640 4096 0 # start the trace strace -o db_startup.log -fp 3062
it is important to specify the “-f” flag for the strace call. this will tell strace to follow the child processes spawned.
in session one startup the database…
startup
… and stop the tracing in the root session once the database is up and re-check the shared memory segments.
ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x7401003e 1310720 root 600 4 0 0x74010014 1998849 root 600 4 0 0x00000000 2359298 root 644 80 2 0x74010013 1966083 root 600 4 0 0x00000000 2392068 root 644 16384 2 0x00000000 2424837 root 644 280 2 0x00000000 2490374 grid 640 4096 0 0x00000000 2523143 grid 640 4096 0 0x8e11371c 2555912 grid 640 4096 0 0x00000000 3538953 oracle 640 4096 0 0x00000000 3571722 oracle 640 4096 0 0x3393b3a4 3604491 oracle 640 4096 0
as you can see, three more segments appeared after the database started up.
you’ll probably noticed some trace output on the screen similar to this:
Process 3468 detached Process 3470 attached (waiting for parent) Process 3470 resumed (parent 3409 ready) Process 3471 attached (waiting for parent) Process 3471 resumed (parent 3470 ready) Process 3469 detached Process 3470 detached
this is because of the “-f” flag given to strace.
the complete trace output is now available in the db_startup.log trace file and we are ready to take a look at it.
the first thing that catches the eye are the various references to the “/proc” filesystem. in may trace file there are 1213 calls to it. you can check this with:
grep "/proc/" db_startup.log | wc -l
take a look at the previous post which introduces the “/proc” filesystem for more information. for the scope of this post just notice how much depends on it.
the actual startup of the database is triggered by the following line:
execve("/opt/oracle/product/base/11.2.0.3/bin/oracle", ["oracleDB112", "(DESCRIPTION=(LOCAL=YES)(ADDRESS"], [/* 22 vars */]) = 0
this is the call to the oracle binary ( execve executes the binary ) with 22 arguments omitted. from now on the oracle instance starts up.
the calls important to the shared memory stuff are the following:
- brk: changes a data segment’s size
- mmap, munmap: maps/unmaps files or devices into memory
- mprotect: sets protection on a region of memory
- shmget: allocates a shared memory segment
- shmat, shmdt: performs attach/detach operations on shared memory
- get_mempolicy: return NUMA memory policies for a process
- semget: get a semaphore identifier
- semctl: perform control operations on a semaphore
- semop, semtimedop: perform sempahore operations
for each of the above commands you can check the man-pages for more information.
as the trace file is rather large and a lot of things are happening i will focus on the minimum ( this is not about re-engineering oracle :) ):
let’s check the keys returned by the ipcs command above:
egrep "3538953|3571722|3604491" db_startup.log ... 5365 shmget(IPC_PRIVATE, 4096, IPC_CREAT|IPC_EXCL|0640) = 3538953 5365 shmget(IPC_PRIVATE, 4096, IPC_CREAT|IPC_EXCL|0640) = 3571722 5365 shmget(0x3393b3a4, 4096, IPC_CREAT|IPC_EXCL|0640) = 3604491 ...
as you can see the identifiers returned by the shmget call ( 3604491,3571722,3538953 ) correspond to the ones reported by ipcs. you wonder about the size of 4096 bytes ? this is because memory_target/memory_max_target is in use by the instance. if the database is configured using sga_target/sga_max_target you would see the actual size. let’s check this:
su - oracle sqlplus / as sysdba alter system reset memory_max_target scope=spfile; alter system reset memory_target scope=spfile; alter system set sga_max_size=256m scope=spfile; alter system set sga_target=256m scope=spfile; alter system set pga_aggregate_target=24m scope=spfile; startup force; exit; # re-check the shared memory segments ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 2359298 root 644 80 2 0x00000000 2392068 root 644 16384 2 0x00000000 2424837 root 644 280 2 0x00000000 2490374 grid 640 4096 0 0x00000000 2523143 grid 640 4096 0 0x8e11371c 2555912 grid 640 4096 0 0x00000000 3801097 oracle 640 8388608 25 0x00000000 3833866 oracle 640 260046848 25 0x3393b3a4 3866635 oracle 640 2097152 25
the “260046848” corresponds to the sga size of 256m and the nattch column shows that 25 processes are attached to it. you can double check the 25
attached processes if you want:
ps -ef | grep DB112 | grep -v LISTENER | grep -v grep | wc -l
let’s return to the memory_target/memory_max_target configuration. as oracle puts together all the memory junks ( pga and sga ) the management of memory changes to the virtual shared memory filesystem ( tmpfs ). unfortunately this is not visible with the ipcs command.
but you can map your memory_* sizes to the shm filesystem:
ls -la /dev/shm/ | grep -v "+ASM" total 466100 drwxrwxrwt 2 root root 2640 Apr 10 13:09 . drwxr-xr-x 10 root root 3400 Apr 10 09:44 .. -rw-r----- 1 oracle asmadmin 4194304 Apr 10 13:27 ora_DB112_3932169_0 -rw-r----- 1 oracle asmadmin 4194304 Apr 10 13:09 ora_DB112_3932169_1 -rw-r----- 1 oracle asmadmin 4194304 Apr 10 13:09 ora_DB112_3964938_0 -rw-r----- 1 oracle asmadmin 4194304 Apr 10 13:20 ora_DB112_3964938_1 -rw-r----- 1 oracle asmadmin 4194304 Apr 10 13:09 ora_DB112_3964938_10 -rw-r----- 1 oracle asmadmin 4194304 Apr 10 13:20 ora_DB112_3964938_11 -rw-r----- 1 oracle asmadmin 4194304 Apr 10 13:10 ora_DB112_3964938_12
note that i have excluded the ASM stuff here. in my case each segment ( or granule ) is 4mb of size ( this depends on the avaible memory of the system ) and the sum of all the segments should get you near to your memory_* configuration.
as ipcs can not tell you much here there are other commands to use. if you want to know which process has a memory granule open:
fuser -v /dev/shm/ora_DB112_4358154_49 USER PID ACCESS COMMAND /dev/shm/ora_DB112_4358154_49: oracle 6626 ....m oracle oracle 6628 ....m oracle oracle 6630 ....m oracle oracle 6634 ....m oracle oracle 6636 ....m oracle oracle 6638 ....m oracle oracle 6640 ....m oracle oracle 6642 ....m oracle oracle 6644 ....m oracle oracle 6646 ....m oracle oracle 6648 ....m oracle oracle 6650 ....m oracle oracle 6652 ....m oracle oracle 6654 ....m oracle oracle 6656 ....m oracle oracle 6658 ....m oracle oracle 6662 ....m oracle oracle 6669 ....m oracle oracle 6744 ....m oracle oracle 6767 ....m oracle oracle 6769 ....m oracle oracle 6791 ....m oracle oracle 7034 ....m oracle
or the other way around, if you want to know which files are opened by a specific process:
ps -ef | grep pmon | grep -v "ASM" oracle 6626 1 0 13:40 ? 00:00:05 ora_pmon_DB112 root 7075 5338 0 14:33 pts/0 00:00:00 grep pmon # use the pmap command on the PID pmap 6626 6626: ora_pmon_DB112 0000000000400000 183436K r-x-- /opt/oracle/product/base/11.2.0.3/bin/oracle 000000000b922000 1884K rwx-- /opt/oracle/product/base/11.2.0.3/bin/oracle 000000000baf9000 304K rwx-- [ anon ] 0000000010c81000 660K rwx-- [ anon ] 0000000060000000 4K r-xs- /dev/shm/ora_DB112_4325385_0 0000000060001000 4092K rwxs- /dev/shm/ora_DB112_4325385_0 0000000060400000 4096K rwxs- /dev/shm/ora_DB112_4325385_1 0000000060800000 4096K rwxs- /dev/shm/ora_DB112_4358154_0 0000000060c00000 4096K rwxs- /dev/shm/ora_DB112_4358154_1 0000000061000000 4096K rwxs- /dev/shm/ora_DB112_4358154_2 0000000061400000 4096K rwxs- /dev/shm/ora_DB112_4358154_3 0000000061800000 4096K rwxs- /dev/shm/ora_DB112_4358154_4 0000000061c00000 4096K rwxs- /dev/shm/ora_DB112_4358154_5 0000000062000000 4096K rwxs- /dev/shm/ora_DB112_4358154_6 ...
if you have troubles starting up your instance with this configuration ( ORA-00845 ) check the size of the virtual filesystem:
df -h Filesystem Size Used Avail Use% Mounted on /dev/hdc1 28G 14G 12G 54% / tmpfs 741M 456M 286M 62% /dev/shm
depending on your configuration ( memory_* or sga_* parameters ) the way that memory is managed changes ( from System V to POSIX, to be exact ).
lots and lots of information. not all of it is important to keep in mind. but what you should remember:
there are several processes and memory segments that make up the oracle instance. as several processes are attached to the same memory regions there must be a way to protect them from concurrent access ( think of semaphores ) … and oracle heavily depends on shared memory. if you scroll through the trace file you’ll notice that there are thousands of operations going on when an oracle instance starts up. imagine what is going on if the instance is under heavy workload and lots and lots of things need protection.
ps: for those interested:
there is plenty of more interesting stuff which you can find in the db_startup.log trace, for example:
writing the audit files:
grep -i adump db_startup.log | grep -v ASM 3404 open("/oradata/DB112/admin/adump/DB112_ora_3404_2.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = 10 3404 write(10, "/oradata/DB112/admin/adump/DB112"..., 47) = 47 3444 open("/oradata/DB112/admin/adump/DB112_ora_3444_1.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = -1 EEXIST (File exists) 3444 open("/oradata/DB112/admin/adump/DB112_ora_3444_2.aud", O_RDWR|O_CREAT|O_EXCL, 0660) = 8 3444 write(8, "/oradata/DB112/admin/adump/DB112"..., 47) = 47 3481 open("/oradata/DB112/admin/adump/DB112_ora_3481_1.aud", O_RDWR|O_CREAT|O_EXCL, 0660 3481 write(8, "/oradata/DB112/admin/adump/DB112"..., 47) = 47
writing the alert.log:
grep -i "alert_DB112.log" db_startup.log 3404 lstat("/oradata/DB112/admin/diag/rdbms/db112/DB112/trace/alert_DB112.log", {st_mode=S_IFREG|0640, st_size=110201, ...}) = 0 3404 open("/oradata/DB112/admin/diag/rdbms/db112/DB112/trace/alert_DB112.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 5 3404 lstat("/oradata/DB112/admin/diag/rdbms/db112/DB112/trace/alert_DB112.log", {st_mode=S_IFREG|0640, st_size=110260, ...}) = 0 3404 open("/oradata/DB112/admin/diag/rdbms/db112/DB112/trace/alert_DB112.log", O_WRONLY|O_CREAT|O_APPEND, 0660) = 11
reading the oracle message files:
grep msb db_startup.log db_startup.log:5438 open("/opt/oracle/product/base/11.2.0.3/oracore/mesg/lrmus.msb", O_RDONLY) = 18 db_startup.log:5438 open("/opt/oracle/product/base/11.2.0.3/oracore/mesg/lrmus.msb", O_RDONLY) = 18 db_startup.log:5430 open("/opt/oracle/product/base/11.2.0.3/rdbms/mesg/oraus.msb", O_RDONLY db_startup.log:5494 open("/opt/oracle/product/base/11.2.0.3/rdbms/mesg/oraus.msb", O_RDONLY
getting sempahores:
grep semget db_startup.log 5365 semget(IPC_PRIVATE, 1, IPC_CREAT|IPC_EXCL|0600) = 1081346 5365 semget(IPC_PRIVATE, 124, IPC_CREAT|IPC_EXCL|0666) = 1114114 5365 semget(IPC_PRIVATE, 124, IPC_CREAT|0660) = 1146882 5365 semget(0x710dfe10, 0, 0) = -1 ENOENT (No such file or directory) 5365 semget(0x46db3f80, 0, 0) = -1 ENOENT (No such file or directory) 5365 semget(0x9ae46084, 0, 0) = -1 ENOENT (No such file or directory) 5365 semget(0xf6dcc368, 0, 0) = -1 ENOENT (No such file or directory) 5365 semget(0x710dfe10, 124, IPC_CREAT|IPC_EXCL|0640) = 1179650
some exadata stuff:
3404 open("/etc/oracle/cell/network-config/cellinit.ora", O_RDONLY) = -1 ENOENT (No such file or directory)
and … and …