why not take a look at postgresql ? (6): monitoring and troubleshooting basics

July 20, 2012 — Leave a comment

once the database server is up and running, high availability might be configured, maintenance scripts are running and backup is in place normal operation starts. what’s still missing is: the procedures and techniques to monitor the database server either for being able to take the necessary actions in case you’ll reach some limits or users are complaining because performance drops.

oracle provides awr/ash reports based on various statistics to help the dba track down any issues and for proper capacity planning. what does postgres provide ? the third post already introduced the analyze command which is used for collecting various statistics about the tables in the database. so, basically, postgres collects statistics as oracle does. the questions is how you can control it, and how you can use the various statistics to build reports that can help you in identifying things of interest?

what, and how much, oracle is collecting is controlled by the parameter statistics_level. in postgres there are several parameters you can specify to control the gathering behaviour. the three most important are:

  • track_counts: enables collection of database wide statistics ( on by default )
  • track_function: enables collection of usage of user defined functions ( off be default )
  • track_activities: enables collection of statistics of each command currently executed by any server process/session ( on by default )

the database uses a separate directory for storing the statistics in temporary files and to provide the statistics to other processes:

psql -U sysdba -d postgres -c "show stats_temp_directory;" stats_temp_directory 
----------------------
 pg_stat_tmp
(1 row)

of course you may and probably should change this to a fast filesystem ( the documentation recommends a RAM based filesystem ). once the server shuts down these statistics will get copied to the global directory to make them permanent. this procedure is slightly different from the oracle approach where the statistics are stored in the data dictionary ( that is the system/sysaux tablespace ).

as oracle does, postgres provides a wide range of views to query the statistics of various areas of the database. the complete list can by found in the documentation.

what impresses me from an oracle point of view are the various functions which can be used to quickly check various statistics. for example, to query the number of transactions comitted in the database you can simple do this:

SELECT pg_stat_get_db_xact_commit(d.oid)
  FROM pg_database d
 WHERE d.datname = 'postgres';

… which will report the commited transactions of the database postgres. in combination which psql’s ability to define aliases this can be a very powerful feature. to give you an example this will define a new alias in psql’s control file:

echo "\\set mypgstats 'SELECT pg_stat_get_db_xact_commit(d.oid) FROM pg_database d WHERE d.datname = \'postgres\';'" >> ~/.psqlrc

when you restart psql you can reference the alias:

sysdba@[local]:5432/postgres*# :mypgstats 
 pg_stat_get_db_xact_commit 
----------------------------
                       2965
(1 row)

… and what is even better ( at least on linux ) : tab completion works, too. even for the aliases. a feature i’d definitely like to see in oracle.

the documentation list a nice example to query the process ids with the commands they are executing:

SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
       pg_stat_get_backend_activity(s.backendid) AS current_query
    FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
 procpid |                           current_query                           
---------+-------------------------------------------------------------------
    4107 | 
    4581 | SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,          +
         |        pg_stat_get_backend_activity(s.backendid) AS current_query+
         |     FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;
(2 rows)

very easy…impressed.

as mentioned above, according to the documentation these statistics will be stored permanently once the database server stops. I wondered if the counters will get re-setted or are ever increasing:

[postgres@postgres ~]$ psql -U sysdba -d postgres -c "SELECT pg_stat_get_db_xact_commit(d.oid) FROM pg_database d WHERE d.datname = 'postgres';"
 pg_stat_get_db_xact_commit 
----------------------------
                       2992
(1 row)

[postgres@postgres ~]$ pgstop
waiting for server to shut down..... done
server stopped
[postgres@postgres ~]$ pgstart
server starting
[postgres@postgres ~]$ psql -U sysdba -d postgres -c "SELECT pg_stat_get_db_xact_commit(d.oid) FROM pg_database d WHERE d.datname = 'postgres';"
 pg_stat_get_db_xact_commit 
----------------------------
                       2994

the values are ever increasing. so, for getting meaningful values you’ll need to store some of the values including the time and compare it to future values of the same statistics. to reset the counters pg_stat_reset is used:

SELECT pg_stat_reset();

I didn’t figure out if there is anything comparable to the oracle awr snapshots of which one may generate reports ( any hints on that are welcome ).

beside the views and functions which provide the statistics there is one more nice system table which reports on all the locks in the database:

sysdba@[local]:5432/postgres# select * from pg_locks;
  locktype  | database | relation |   page   |  tuple   | virtualxid | transactionid | classid  |  objid   | objsubid | virtualtransaction | pid  |      mode       | granted 
------------+----------+----------+----------+----------+------------+---------------+----------+----------+----------+--------------------+------+-----------------+---------
 relation   |    12780 |    11000 | <> | <> | <>   |      <> | <> | <> | <> | 3/21               | 4639 | AccessShareLock | t
 virtualxid | <> | <> | <> | <> | 3/21       |      <> | <> | <> | <> | 3/21               | 4639 | ExclusiveLock   | t
(2 rows)

… a nice and simple overview which can help in identifying causes of issues.

in linux as well as most of the unixes you may use top to query the current processes running on the system and their resource consumptions. there is a seperate project on the postgres website called ptop which provides a similar functionality for the postgres database sessions. sample output:

last pid:  1245;  load avg:  1.60,  1.29,  0.52;       up 0+00:02:25                   08:18:52
2 processes: 1 running, 1 sleeping
CPU states:  1.0% user,  0.0% nice,  1.8% system, 93.3% idle,  4.0% iowait
Memory: 229M used, 768M free, 52M buffers, 79M cached
Swap: 2000M free
  PID USERNAME PRI NICE  SIZE   RES STATE   TIME   WCPU    CPU COMMAND
 1246 lcspostg  20    0  461M 5356K run     0:00  0.02%  0.20% postgres: sysdba dbs200 [local] 
 1245 lcspostg  20    0  461M 4104K sleep   0:00  0.02%  0.20% postgres: sysdba postgres [local

ptop might not be available in your postgres distribution but it can easily be compiled from the contrib section of the source distribution. some vendors provide packaged versions as well.

for daily operations there is a nice plugin for nagios which saves you a lot of work and automates much of the daily tasks.

another extension one might consider is pg_stat_statements. it is available in the contrib directory of the source code distribution, too. once compiled and installed it provides an overview of the statements executed in the system:

select * from pg_stat_statements;
userid | dbid  |                                                                                                              query                                                                    
                                          | calls | total_time | rows | shared_blks_hit | shared_blks_read | shared_blks_written | local_blks_hit | local_blks_read | local_blks_written | temp_blks_rea
d | temp_blks_written 
--------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------+-------+------------+------+-----------------+------------------+---------------------+----------------+-----------------+--------------------+--------------
--+-------------------
     10 | 12780 | SELECT a.attname,                                                                                                                                                                     
                                         +|     1 |   0.029022 |   13 |              34 |                4 |                   0 |              0 |               0 |                  0 |              
0 |                 0

more to come about the parser, rewriter and planner

No Comments

Be the first to start the conversation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.