optimizer basics (4)

June 19, 2012 — Leave a comment

by now we know that we can tell the optimizer to write its decisions to a trace file. this file can tell us what access plans for the sql in question were considered and what statistics were used to come up with the final costs.

as introduced in the first post one parameter which influences the correctness of the statistics is the estimate_percent parameter of the the gather_*_stats procedures. wouldn’t it be great if we could compare statistics which were gathered with different values for the parameters to help us to decide which will be the best strategy for our database and application ?

recall the table containing the house- and phonenumbers which was created in the second post. we already noticed that the num_rows statistic reported different values once the estimate_percent changed. so let’s see if we can report the changes for different sets of parameters for the gather_*_stats procedures.

the first thing we’ll need is a table where we can export some sets of statistics to:

BEGIN
dbms_stats.create_stat_table ( ownname => USER
, stattab => 'MY_STATS'
, tblspace => 'USERS'
);
END;
/

as oracle recommends using the default value for the estimate_percent ( which is DBMS_STATS.AUTO_SAMPLE_SIZE ) this shall be our first set of statistics:

BEGIN
dbms_stats.gather_table_stats ( ownname => USER
, tabname => 'T1' );
END;
/

we already noticed in the first post that the values will not be that correct:

SELECT num_rows
FROM user_tab_statistics
WHERE table_name = 'T1';
NUM_ROWS
----------
9966

the actual number of rows is still 10’000:

SELECT count(*)
FROM t1;
COUNT(*)
----------
10000

before playing around with the estimate_parameter save the current statistics to the statistic table created above:

BEGIN
dbms_stats.export_table_stats ( ownname => USER
, tabname => 'T1'
, statid => 'DEFAULT_ESTIMATE_PERCENT'
, stattab => 'MY_STATS'
);
END;
/

now there is stored set of statistics we may use to compare with other sets. as we know a value of 100 for the estimate_percent parameter should produce better statistics let’s re-generate with the increased value:

BEGIN
dbms_stats.gather_table_stats ( ownname => USER
, tabname => 'T1'
, estimate_percent => 100 );
END;
/

… and quickly check the result for the num_rows statistic:

SELECT num_rows
FROM user_tab_statistics
WHERE table_name = 'T1';
NUM_ROWS
----------
10000

looks correct. again, let’s save the statistics:

BEGIN
dbms_stats.export_table_stats ( ownname => USER
, tabname => 'T1'
, statid => 'ESTIMATE_PERCENT_100'
, stattab => 'MY_STATS'
);
END;
/

to see what values changed for the different sets we can now compare the sets of statistics:

SET LONG 20000
SET LINES 164
SET PAGES 999
SELECT *
FROM TABLE ( dbms_stats.diff_table_stats_in_stattab ( USER
, 'T1'
, 'MY_STATS'
, 'MY_STATS'
, NULL
, 'DEFAULT_ESTIMATE_PERCENT'
, 'ESTIMATE_PERCENT_100'
)
);

this will produce a report similar to this one:

REPORT MAXDIFFPCT
-------------------------------------------------------------------------------- ----------
###############################################################################
STATISTICS DIFFERENCE REPORT FOR:
.................................
TABLE : T1
OWNER : OW
SOURCE A : User statistics table MY_STATS
: Statid : DEFAULT_ESTIMATE_PERCENT
: Owner : OW
SOURCE B : User statistics table MY_STATS
: Statid : ESTIMATE_PERCENT_100
: Owner : OW
PCTTHRESHOLD :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................
OBJECTNAME TYP SRC ROWS BLOCKS ROWLEN SAMPSIZE
...............................................................................
T1 T A 9966 20 8 4764
B 10000 20 8 10000
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
COLUMN STATISTICS DIFFERENCE:
.............................
COLUMN_NAME SRC NDV DENSITY HIST NULLS LEN MIN MAX SAMPSIZ
...............................................................................
HOUSENUMBER A 9941 .000100593 NO 0 4 C104 C302 4752
B 10000 .0001 NO 0 4 C102 C302 10000
PHONENUMBER A 5503 .000181719 NO 0 4 C104 C2646 4852
B 6376 .000156838 NO 0 4 C102 C2646 10000
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INDEX / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................
OBJECTNAME TYP SRC ROWS LEAFBLK DISTKEY LF/KY DB/KY CLF LVL SAMPSIZ
...............................................................................
INDEX: I_HOUSENUMBER
....................
I_HOUSENUMBER I A 10000 21 10000 1 1 18 1 10000
B 10000 21 10000 1 1 18 1 10000
INDEX: I_PHONENUMBER
....................
I_PHONENUMBER I A 10000 21 6376 1 1 9411 1 10000
B 10000 21 6376 1 1 9411 1 10000
###############################################################################

take a look at the report and you can easily see how the statistics changed when increasing the estimate_percent parameter ( of course you can change any other parameter, too ).

this is a great feature if you want to play around with different settings for the gather_*_procedures and want to check which settings are the best for your case. and always remember that you can set different parameters down to the table level. for a quick description check this post.

No Comments

Be the first to start the conversation!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.