as mentioned in earlier posts as a dba you need to know how the operating system works. this post is an introduction to processes on linux.
the definition of process is: a process is an instance of a program in execution.
to manage a process the linux kernel must know a lot of things about the process, e.g. which files the process is allowed to handle, if it is running on CPU or blocked, the address space of the process etc. all this information is present in the so called process descriptor. you can think of the process descriptor as a strcuture containing all the information the kernel needs to know about the process ( internally the structure is called: task_structure ). on of the information stored in the process descriptor is the process id which is used to identify the process.
let’s take a look at the processes that make up the oracle database:
ps -ef | grep $ORACLE_SID | egrep -v "DESCRIPTION|grep|tnslsnr" oracle 2944 1 0 08:30 ? 00:00:03 ora_pmon_DB112 oracle 2946 1 0 08:30 ? 00:00:06 ora_psp0_DB112 oracle 2948 1 2 08:30 ? 00:01:11 ora_vktm_DB112 oracle 2952 1 0 08:30 ? 00:00:02 ora_gen0_DB112 oracle 2954 1 0 08:30 ? 00:00:03 ora_diag_DB112 oracle 2956 1 0 08:30 ? 00:00:02 ora_dbrm_DB112 oracle 2958 1 0 08:30 ? 00:00:06 ora_dia0_DB112 oracle 2961 1 0 08:30 ? 00:00:02 ora_mman_DB112 oracle 2963 1 0 08:30 ? 00:00:03 ora_dbw0_DB112 oracle 2965 1 0 08:30 ? 00:00:03 ora_lgwr_DB112 oracle 2967 1 0 08:30 ? 00:00:06 ora_ckpt_DB112 oracle 2969 1 0 08:30 ? 00:00:01 ora_smon_DB112 oracle 2971 1 0 08:30 ? 00:00:00 ora_reco_DB112 oracle 2973 1 0 08:30 ? 00:00:02 ora_rbal_DB112 oracle 2975 1 0 08:30 ? 00:00:01 ora_asmb_DB112 oracle 2977 1 0 08:30 ? 00:00:05 ora_mmon_DB112 oracle 2979 1 0 08:30 ? 00:00:09 ora_mmnl_DB112 oracle 2987 1 0 08:30 ? 00:00:04 ora_mark_DB112 oracle 3013 1 0 08:30 ? 00:00:00 ora_qmnc_DB112 oracle 3054 1 0 08:30 ? 00:00:01 ora_q000_DB112 oracle 3056 1 0 08:30 ? 00:00:00 ora_q001_DB112 oracle 3182 1 0 08:35 ? 00:00:02 ora_smco_DB112 oracle 3359 1 0 09:05 ? 00:00:00 ora_w000_DB112
notice that the grep command excluded the listener process and all the current connections to the database.
if you want to check the local connections connections to the database from the os, you can do something like this:
ps -ef | grep $ORACLE_SID | grep "LOCAL=YES" oracle 2933 1 0 10:48 ? 00:00:01 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 2969 1 0 10:48 ? 00:00:00 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 2979 1 0 10:48 ? 00:00:00 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 3088 3087 0 10:50 ? 00:00:00 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 3096 1 0 10:50 ? 00:00:00 oracleDB112 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
checking the processes from inside the database would be as simple as this ( for the background processes ):
SQL> select pname from v$process where pname is not null; PNAME ----- PMON PSP0 VKTM GEN0 DIAG DBRM DIA0 MMAN DBW0 LGWR CKPT SMON RECO RBAL ASMB MMON MMNL MARK SMCO W000 QMNC Q000 Q001
with the above arguments ( -ef ) supplied to the ps command, the columns displayed are:
- the os-user the process runs under
- the process id
- the parent process id
- processor utilization
- start time of the process
- the terminal the process was started on ( if any )
- the cumulative CPU time
- the command
but where does the ps command get the information to display from ? in fact you can get all of the information displayed above without using the ps command. all you need to do is to check the pseudo filesystem /proc ( it is called a pseudo filesystem because it is a virtual filesystem that maps to the kernel structures ).
if you do a “ls” on the proc filesystem you’ll see a lot of directories and files. for this post we will concentrate on the numbered directories which map to process ids.
let’s take smon as an example, which is the oracle system monitor ( you will need to adjust the process-id for your environment ):
ls -la /proc/2969/ dr-xr-xr-x 6 oracle asmadmin 0 Apr 3 13:40 . dr-xr-xr-x 154 root root 0 Apr 3 2012 .. dr-xr-xr-x 2 oracle asmadmin 0 Apr 3 14:07 attr -r-------- 1 root root 0 Apr 3 14:07 auxv -r--r--r-- 1 root root 0 Apr 3 13:45 cmdline -rw-r--r-- 1 root root 0 Apr 3 14:07 coredump_filter -r--r--r-- 1 root root 0 Apr 3 14:07 cpuset lrwxrwxrwx 1 root root 0 Apr 3 14:07 cwd -> /opt/oracle/product/base/11.2.0.3/dbs -r-------- 1 root root 0 Apr 3 14:07 environ lrwxrwxrwx 1 root root 0 Apr 3 14:07 exe -> /opt/oracle/product/base/11.2.0.3/bin/oracle dr-x------ 2 root root 0 Apr 3 13:40 fd dr-x------ 2 root root 0 Apr 3 14:07 fdinfo -r-------- 1 root root 0 Apr 3 14:07 io -r--r--r-- 1 root root 0 Apr 3 14:07 limits -rw-r--r-- 1 root root 0 Apr 3 14:07 loginuid -r--r--r-- 1 root root 0 Apr 3 13:40 maps -rw------- 1 root root 0 Apr 3 14:07 mem -r--r--r-- 1 root root 0 Apr 3 14:07 mounts -r-------- 1 root root 0 Apr 3 14:07 mountstats -r--r--r-- 1 root root 0 Apr 3 14:07 numa_maps -rw-r--r-- 1 root root 0 Apr 3 14:07 oom_adj -r--r--r-- 1 root root 0 Apr 3 14:07 oom_score lrwxrwxrwx 1 root root 0 Apr 3 14:07 root -> / -r--r--r-- 1 root root 0 Apr 3 14:07 schedstat -r--r--r-- 1 root root 0 Apr 3 14:07 smaps -r--r--r-- 1 root root 0 Apr 3 13:40 stat -r--r--r-- 1 root root 0 Apr 3 14:07 statm -r--r--r-- 1 root root 0 Apr 3 13:45 status dr-xr-xr-x 3 oracle asmadmin 0 Apr 3 14:07 task -r--r--r-- 1 root root 0 Apr 3 14:07 wchan
what do we see here? lots and lots of information of the smon process. for a detailed description of what all the files and directories are about, you can go to the man-pages:
man proc
for example, if we take a look at the statm file of the process:
cat /proc/2969/statm 126385 16013 14717 45859 0 994 0
… and check the man pages for the meaning of the numbers, things are getting clearer:
/proc/[number]/statm Provides information about memory status in pages. The columns are: size total program size resident resident set size share shared pages text text (code) lib library data data/stack dt dirty pages (unused in Linux 2.6)
wanting to know the environment of the process? just take a look at the environ file:
cat /proc/2969/environ __CLSAGFW_TYPE_NAME=ora.listener.typeORA_CRS_HOME=/opt/oracle/product/crs/11.2.0.3SELINUX_INIT=YESCONSOLE=/dev/consoleTERM=linuxSHELL=/bin/bash__CRSD_CONNECT_STR=(ADDRESS=(PROTOCOL=IPC)(KEY=OHASD_IPC_SOCKET_11))NLS_LANG=AMERICAN_AMERICA.AL32UTF8CRF_HOME=/opt/oracle/product/crs/11.2.0.3GIPCD_PASSTHROUGH=false__CRSD_AGENT_NAME=/opt/oracle/product/crs/11.2.0.3/bin/oraagent_grid__CRSD_MSG_FRAME_VERSION=2USER=gridINIT_VERSION=sysvinit-2.86__CLSAGENT_INCARNATION=2ORASYM=/opt/oracle/product/crs/11.2.0.3/bin/oraagent.binPATH=RUNLEVEL=3runlevel=3PWD=/ENV_FILE=/opt/oracle/product/crs/11.2.0.3/crs/install/s_crsconfig_oracleplayground_env.txtLANG=en_US.UTF-8TZ=Europe/Zurich__IS_HASD_AGENT=TRUEPREVLEVEL=Nprevious=N__CLSAGENT_LOG_NAME=ora.listener.type_gridHOME=/home/gridSHLVL=3__CLSAGENT_LOGDIR_NAME=ohasdLD_ASSUME_KERNEL=__CLSAGENT_USER_NAME=gridLOGNAME=gridORACLE_HOME=/opt/oracle/product/base/11.2.0.3ORACLE_SID=DB112ORA_NET2_DESC=34,37ORACLE_SPAWNED_PROCESS=1SKGP_SPAWN_DIAG_PRE_FORK_TS=1333453218SKGP_SPAWN_DIAG_POST_FORK_TS=1333453218SKGP_HIDDEN_ARGS=0SKGP_SPAWN_DIAG_PRE_EXEC_TS=1333453218[root@oracleplayground 2642]#
… which files were opened by the process ?:
ls -la fd/ total 0 dr-x------ 2 root root 0 Apr 3 13:40 . dr-xr-xr-x 6 oracle asmadmin 0 Apr 3 13:40 .. lr-x------ 1 root root 64 Apr 3 16:23 0 -> /dev/null l-wx------ 1 root root 64 Apr 3 16:23 1 -> /dev/null lr-x------ 1 root root 64 Apr 3 16:23 10 -> /dev/null lr-x------ 1 root root 64 Apr 3 16:23 11 -> /dev/null lr-x------ 1 root root 64 Apr 3 16:23 12 -> /dev/null lrwx------ 1 root root 64 Apr 3 16:23 13 -> /opt/oracle/product/base/11.2.0.3/dbs/hc_DB112.dat lr-x------ 1 root root 64 Apr 3 16:23 14 -> /dev/null lr-x------ 1 root root 64 Apr 3 16:23 15 -> /dev/null lr-x------ 1 root root 64 Apr 3 16:23 16 -> /dev/zero lr-x------ 1 root root 64 Apr 3 16:23 17 -> /dev/zero lrwx------ 1 root root 64 Apr 3 16:23 18 -> /opt/oracle/product/base/11.2.0.3/dbs/hc_DB112.dat lr-x------ 1 root root 64 Apr 3 16:23 19 -> /opt/oracle/product/base/11.2.0.3/rdbms/mesg/oraus.msb l-wx------ 1 root root 64 Apr 3 16:23 2 -> /dev/null lr-x------ 1 root root 64 Apr 3 16:23 20 -> /proc/2875/fd lr-x------ 1 root root 64 Apr 3 16:23 21 -> /opt/oracle/product/crs/11.2.0.3/dbs/hc_+ASM.dat lr-x------ 1 root root 64 Apr 3 16:23 22 -> /dev/zero lrwx------ 1 root root 64 Apr 3 16:23 23 -> /opt/oracle/product/base/11.2.0.3/dbs/hc_DB112.dat lrwx------ 1 root root 64 Apr 3 16:23 24 -> /opt/oracle/product/base/11.2.0.3/dbs/lkDB112 lr-x------ 1 root root 64 Apr 3 16:23 25 -> /opt/oracle/product/base/11.2.0.3/rdbms/mesg/oraus.msb lrwx------ 1 root root 64 Apr 3 16:23 256 -> /dev/sda1 lrwx------ 1 root root 64 Apr 3 16:23 3 -> /opt/oracle/product/crs/11.2.0.3/log/oracleplayground/agent/ohasd/oraagent_grid/oraagent_gridOUT.log l-wx------ 1 root root 64 Apr 3 16:23 4 -> /opt/oracle/product/crs/11.2.0.3/log/oracleplayground/agent/ohasd/oraagent_grid/oraagent_grid.l01 lr-x------ 1 root root 64 Apr 3 16:23 5 -> /dev/null lrwx------ 1 root root 64 Apr 3 16:23 6 -> socket:[7791] lrwx------ 1 root root 64 Apr 3 16:23 7 -> socket:[7792] lrwx------ 1 root root 64 Apr 3 16:23 8 -> socket:[7793] lrwx------ 1 root root 64 Apr 3 16:23 9 -> socket:[7794]
conclusion: it’s really worth to read the man pages and understand the /proc/[PID] structures. this can be a very good starting point if you have troubles with one of the processes running on your system.
and last but not least: maybe you don’t believe that the ps command is reading the /proc/[PID] structures to diplay it’s information. you can always trace the commands and check what’s happening behind:
strace -o strace.log ps -ef
this will write the strace output to a file named strace.log. grep for you smon process and check which files were read:
grep 2969 strace.log stat("/proc/2969", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 open("/proc/2969/stat", O_RDONLY) = 6 read(6, "2969 (oracle) S 1 2921 2921 0 -1"..., 1023) = 191 open("/proc/2969/status", O_RDONLY) = 6 open("/proc/2969/cmdline", O_RDONLY) = 6 write(1, "oracle 2969 1 0 10:48 ? "..., 63) = 63
here we go: a subset of the same files listed above:
/proc/2969/stat /proc/2969/status /proc/2969/cmdline
happy processing …