Archives For resources

consider reading this before jumping on this train:

Oracle’s In-Memory Database: The True Cost Of Licensing

if you are on a distribution based on redhat 6 there are some interesting tools which can help in fine tuning the system for different workloads. e.g. if you’d like to put a database on
your server there a various settings you might want to adjust ( kernel, disks, network … ). if you use the system as a workstation other settings might make more sense ( power saving settings, for example ).

as this blog mainly is about databases i’ll focus there, obviously. first of all you’ll need the software:

yum install tuned

as tuned is a service you’ll need to enable and start it.

service tuned start
chkconfig tuned on
chkconfig --list | grep tuned

let’s see what happend. the main configuration file for tuned is located in /etc:


if you take a look at the file you will find a main section and various plugins sections (e.g. DiskTuning or CPUTuning).

next several default tuning profiles have been created:

ls -l /etc/tune-profiles/

each of these directories contains the same configuration files (, ktune.sysconfig, sysctl.ktune, tuned.conf ) which specify the various settings which will be set once the profile becomes active.

you can list the available profiles with the tune-adm command, too:

tuned-adm list
Available profiles:
- server-powersave
- laptop-ac-powersave
- latency-performance
- default
- desktop-powersave
- enterprise-storage
- virtual-guest
- virtual-host
- spindown-disk
- laptop-battery-powersave
- throughput-performance
Current active profile: default

… which additionally tells us that the default profile is the one which is active at the moment.

another way to check the active profile is:

tuned-adm active

if you want to create a new profile just copy an existing one and adjust the settings you want to:

cp -pr /etc/tune-profiles/enterprise-storage/ /etc/tune-profiles/my_profile
tuned-adm list | grep my_profile
- my_profile

for databases you’ll probably need maximum throughput, so let’s activate the throughput-performance profile:

tuned-adm profile throughput-performance
Stopping tuned:                                            [  OK  ]
Switching to profile 'throughput-performance'
Applying ktune sysctl settings:
/etc/ktune.d/tunedadm.conf:                                [  OK  ]
Calling '/etc/ktune.d/ start':                  [  OK  ]
Applying sysctl settings from /etc/sysctl.conf
Applying deadline elevator: sda                            [  OK  ]
Starting tuned:                                            [  OK  ]

according to the documentation and the output from above this should change the io scheduler to deadline ( which is recommended for databases ). is this true ?

cat /sys/block/sda/queue/scheduler 
noop anticipatory [deadline] cfq

seems to work. does this survive a reboot?

tuned-adm active
cat /sys/block/sda/queue/scheduler
noop anticipatory [deadline] cfq

very good. no need to adjust this in the bootloader anymore.

if you want to check which kernel settings have been adjusted by activating this profile just have a look at the configuration files:

cat /etc/tune-profiles/throughput-performance/sysctl.ktune

include any kernel setting you need in there and you’re fine.

as profiles may be switched on the fly several profiles activated at different times of the day might make sense, too.

if you are on linux/solaris and sar is configured and running on your system there is a nice utility called kSar which can be used to create graphs of the various statistics sar gathered. this can be very handy if you are looking for peaks and want to have a quick overview what happened on your system.

installing kSar is just a matter of unzipping the provided package and either executing the script or use java directly to execute the jar file:

java -jar kSar.jar

this will start ksar and you may load the sar files for having a look at the statistics:

another option is to generate a pdf:

java -jar kSar.jar -input '/var/log/sa/sarXX' -outputPDF today.pdf


and even faster: create a bash function and an alias in your .bashrc:

ksarfunc() {
java -jar PATH_TO/kSar.jar -input "$1" -outputPDF today.pdf
alias ksar='ksarfunc'

… and you will be able to quickly generate a pdf for a specific sar file:

ksar /path/to/sar/file

a much more comprehensive tutorial for sar and ksar can be found here.

just a little hint that there is another option than top, which is htop. pre-compiled packages are available for the most distributions.

check htop’s sourceforge page for a tiny comparison between htop and top.

another ouch with GI on solaris 10 sparc 64bit: if one of your cluster nodes restarts and you can not find any evident reason for it despite some of these entries in the logs:

[cssd(1084)]CRS-1612:Network communication with node1 node (1) missing for 50% of timeout interval. Removal of this node from cluster in 14
.258 seconds
[cssd(1084)]CRS-1625:Node node1, number 1, was manually shut down
[cssd(1084)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node2 .
[ctssd(1117)]CRS-2407:The new Cluster Time Synchronization Service reference node is host node2.
[crsd(1522)]CRS-5504:Node down event reported for node 'node1'.
[cssd(1084)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 node2 .

… and:

[ CSSD][20](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 1 configured voting disks available, need 1
[ CSSD][20]###################################
[ CSSD][20]clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread
[ CSSD][20]###################################
[ CSSD][20](:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally
[ CSSD][20]

you probably hit bug 13869978. this seems only to happen if you are on external redundancy for the cluster diskgroup and therefore only one voting disk was created.

two solutions are available:

  • migrate the votings disk to an asm mirrored diskgroup ( normal or high redundancy )
  • or apply PSU4 on top of

there seems to be the same issue on linux.

if you want to manage vips in the grid infrastructure which are not on the default network and you get this: “CRS-2534: Resource type ‘ora.cluster_vip_net2.type’ is not registered” don’t panic, it is easy to fix. basically you need create the “ora.cluster_vip_net2.type”-type before adding the vip with appvipcfg:

./srvctl add network -k 2 -S x.x.x.0/
./crsctl start resource
./crsctl add type ora.cluster_vip_net2.type -basetype ora.cluster_vip.type
./appvipcfg create -network=2 -ip=x.x.x.x -vipname=myvip1 -user=root
./crsctl start resource vip1 -n server1
./appvipcfg create -network=2 -ip=x.x.x.x -vipname=myvip2 -user=root
./crsctl start resource vip2 -n server2
./crsctl stat res –t
./crsctl modify res 'myvip1' -attr "HOSTING_MEMBERS=server1 server2"
./crsctl modify res 'myvip2' -attr "HOSTING_MEMBERS=server1 server2"

not sure, but I think this is a bug as appvipcfg should manage this.

the above is valid for on Solaris SPARC 64bit

linux ( as well as most of the unixes ) provides the ability to integrate many different file systems at the same time. to name a few of them:

  • ext2, ext3, ext4
  • ocfs, ocfs2
  • reiserfs
  • vxfs
  • brtfs
  • dos, ntfs

although each of them provides different features and was developed with different purposes in mind the tools to work with them stay the same:

  • cp
  • mv
  • cd

the layer which makes this possible is called the virtual filesystem ( vfs ). this layer provides a common interface for the filesystems which are plugged into the operating system. I already introduced one special kind of filesystem, the the proc filesystem. the proc filesystem does not handle any files on disk or on the network, but neitherless it is a filesystem. in addition to the above mentioned filesystems, which all are disk based, filesystem may also handle files on the network, such as nfs or cifs.

no matter what kind of filesystem you are working with: when interacting with the filesystem by using the commands of choice you are routed through the virtual filesystem:

the virtual file system

to make this possible there needs to be a standard all file system implementations must comply with, and this standard is called the common file model. the key components this model consist of are:

  • the superblock which stores information about a mounted filesystem ( … that is stored in memory as a doube linked list )
  • inodes which store information about a specific file ( … that are stored in memory as a doube linked list)
  • the file object which stores information of the underlying files
  • dentries, which represent the links to build the directory structure ( … that are stored in memory as a doube linked list)

to speed up operations on the file systems some of the information which is normally stored on disk are cached. if you recall the post about slabs, you can find an entry like the following in the /proc/slabinfo file if you have a mounted ext4 filesystem on your system:

cat /proc/slabinfo | grep ext4 | grep cache
ext4_inode_cache   34397  34408    920   17    4 : tunables    0    0    0 : slabdata   2024   2024      0

so what needs the kernel to do if, for example, a request for listing the contents of a directoy comes in and the directory resides on an ext4 filesystem? because the filesystem is mounted the kernel knows that the filesystem for the specific request is of type ext4. the ls command will then be translated ( pointed ) to the specific ls implementation of the ext4 filesystem. this operation is the same for all commands interacting with filesystems. there is a pointer for each operation that links to the specific implementation of the command in question:

directory listing

as the superblock is stored in memory and therefore may become dirty, that is not synchronized with the superblock on disk, there is the same issue that oracle must handle with its buffer pools: periodically check the dirty flag and write down the changes to disk. the same is true for inodes ( while in memory ), which contain all the information that make up a file. closing a loop to oracle again: to speed up searching the ionodes linux maintains a hash table for fast access ( remember how oracle uses hashes to identify sql statements in the shared_pool ).

when there are files, there are processes which want to work with files. once a file is opened a new file object will be created. as these are frequent operations file objects are allocated through a slab cache.

the file objects itself are visible to the user through the /proc filesystem per process:

ls -la /proc/*/fd/
total 0
dr-x------ 2 root root  0 2012-05-18 14:03 .
dr-xr-xr-x 8 root root  0 2012-05-18 06:40 ..
lrwx------ 1 root root 64 2012-05-18 14:03 0 -> /dev/null
lrwx------ 1 root root 64 2012-05-18 14:03 1 -> /dev/null
lr-x------ 1 root root 64 2012-05-18 14:03 10 -> anon_inode:inotify
lrwx------ 1 root root 64 2012-05-18 14:03 2 -> /dev/null
lrwx------ 1 root root 64 2012-05-18 14:03 3 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 2012-05-18 14:03 4 -> /dev/null
lrwx------ 1 root root 64 2012-05-18 14:03 5 -> anon_inode:[signalfd]
lrwx------ 1 root root 64 2012-05-18 14:03 6 -> socket:[7507]
lrwx------ 1 root root 64 2012-05-18 14:03 7 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 2012-05-18 14:03 8 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 2012-05-18 14:03 9 -> socket:[11878]

usually numbers 0 – 3 refer to the standard input, standard output and standard error of the corresponding process.

last but not least there are the dentries. as with the file objects, dentries are allocated from a slab cache, the dentry cache in this case:

cat /proc/slabinfo | grep dentry
dentry             60121  61299    192   21    1 : tunables    0    0    0 : slabdata   2919   2919      0

directories are files, too, but special in that kind that dictories may contain other files or directories. once a directory is read into memory it is transformed into a dentry object. as this operation is expensive there is the dentry cache mentioned above. thus the operations for building the dentry objects can be minimized.
another link to oracle wording: the unused dentry double linked list uses a least recently used ( lru ) algorithm to track the usage of the entries. when the kernel needs to shrink the cache the objects at the tail of the list will be removed. as with the ionodes there is hash table for the dentries and a lock protecting the lists ( dcache_spin_lock in this case ).

this should give you enough hints to go further if you are interesed …