by now we know that we can tell the optimizer to write its decisions to a trace file. this file can tell us what access plans for the sql in question were considered and what statistics were used to come up with the final costs.

as introduced in the first post one parameter which influences the correctness of the statistics is the estimate_percent parameter of the the gather_*_stats procedures. wouldn’t it be great if we could compare statistics which were gathered with different values for the parameters to help us to decide which will be the best strategy for our database and application ?

recall the table containing the house- and phonenumbers which was created in the second post. we already noticed that the num_rows statistic reported different values once the estimate_percent changed. so let’s see if we can report the changes for different sets of parameters for the gather_*_stats procedures.

the first thing we’ll need is a table where we can export some sets of statistics to:

BEGIN

dbms_stats.create_stat_table ( ownname => USER

, stattab => 'MY_STATS'

, tblspace => 'USERS'

);

END;

/

as oracle recommends using the default value for the estimate_percent ( which is DBMS_STATS.AUTO_SAMPLE_SIZE ) this shall be our first set of statistics:

BEGIN

dbms_stats.gather_table_stats ( ownname => USER

, tabname => 'T1' );

END;

/

we already noticed in the first post that the values will not be that correct:

SELECT num_rows

FROM user_tab_statistics

WHERE table_name = 'T1';

NUM_ROWS

----------

9966

the actual number of rows is still 10’000:

SELECT count(*)

FROM t1;

COUNT(*)

----------

10000

before playing around with the estimate_parameter save the current statistics to the statistic table created above:

BEGIN

dbms_stats.export_table_stats ( ownname => USER

, tabname => 'T1'

, statid => 'DEFAULT_ESTIMATE_PERCENT'

, stattab => 'MY_STATS'

);

END;

/

now there is stored set of statistics we may use to compare with other sets. as we know a value of 100 for the estimate_percent parameter should produce better statistics let’s re-generate with the increased value:

BEGIN

dbms_stats.gather_table_stats ( ownname => USER

, tabname => 'T1'

, estimate_percent => 100 );

END;

/

… and quickly check the result for the num_rows statistic:

SELECT num_rows

FROM user_tab_statistics

WHERE table_name = 'T1';

NUM_ROWS

----------

10000

looks correct. again, let’s save the statistics:

BEGIN

dbms_stats.export_table_stats ( ownname => USER

, tabname => 'T1'

, statid => 'ESTIMATE_PERCENT_100'

, stattab => 'MY_STATS'

);

END;

/

to see what values changed for the different sets we can now compare the sets of statistics:

SET LONG 20000

SET LINES 164

SET PAGES 999

SELECT *

FROM TABLE ( dbms_stats.diff_table_stats_in_stattab ( USER

, 'T1'

, 'MY_STATS'

, 'MY_STATS'

, NULL

, 'DEFAULT_ESTIMATE_PERCENT'

, 'ESTIMATE_PERCENT_100'

)

);

this will produce a report similar to this one:

REPORT MAXDIFFPCT

-------------------------------------------------------------------------------- ----------

###############################################################################

STATISTICS DIFFERENCE REPORT FOR:

.................................

TABLE : T1

OWNER : OW

SOURCE A : User statistics table MY_STATS

: Statid : DEFAULT_ESTIMATE_PERCENT

: Owner : OW

SOURCE B : User statistics table MY_STATS

: Statid : ESTIMATE_PERCENT_100

: Owner : OW

PCTTHRESHOLD :

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:

.............................................

OBJECTNAME TYP SRC ROWS BLOCKS ROWLEN SAMPSIZE

...............................................................................

T1 T A 9966 20 8 4764

B 10000 20 8 10000

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

COLUMN STATISTICS DIFFERENCE:

.............................

COLUMN_NAME SRC NDV DENSITY HIST NULLS LEN MIN MAX SAMPSIZ

...............................................................................

HOUSENUMBER A 9941 .000100593 NO 0 4 C104 C302 4752

B 10000 .0001 NO 0 4 C102 C302 10000

PHONENUMBER A 5503 .000181719 NO 0 4 C104 C2646 4852

B 6376 .000156838 NO 0 4 C102 C2646 10000

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

INDEX / (SUB)PARTITION STATISTICS DIFFERENCE:

.............................................

OBJECTNAME TYP SRC ROWS LEAFBLK DISTKEY LF/KY DB/KY CLF LVL SAMPSIZ

...............................................................................

INDEX: I_HOUSENUMBER

....................

I_HOUSENUMBER I A 10000 21 10000 1 1 18 1 10000

B 10000 21 10000 1 1 18 1 10000

INDEX: I_PHONENUMBER

....................

I_PHONENUMBER I A 10000 21 6376 1 1 9411 1 10000

B 10000 21 6376 1 1 9411 1 10000

###############################################################################

take a look at the report and you can easily see how the statistics changed when increasing the estimate_percent parameter ( of course you can change any other parameter, too ).

this is a great feature if you want to play around with different settings for the gather_*_procedures and want to check which settings are the best for your case. and always remember that you can set different parameters down to the table level. for a quick description check this post.