Histograms helps the optimizer determine the selectivity of predicates in the queries. Selectivity estimates are used to decide whether the optimizer should use an index and the order in which to join tables. * How does ORACLE create and store histograms info.? ------------------------------------------------- SQL> select c2, count(1) from test1 group by c2; C2 COUNT(1) ---------- ---------- 0 1 1 16 2 4 7 1 9 1 -- Buckets fewer than # of distinct values, ENDPOINT_NUMBER refers to bucket #. -- Look at ENDPOINT_NUMBER -- (0-1) is bucket 1, (1-2) is bucket 2, which is not shown in user_histograms. -- This means that bucket (0-1) contains same data as bucket (1-2). SQL> analyze table test1 compute statistics for columns c2 size 4; Table analyzed. -- Height-balanced histogram created. Each bucket contains roughly same # of members -- Row (1, 1) is merged with row (2,1) -- SQL> select lpad(TABLE_NAME,10), lpad(COLUMN_NAME, 5), 2 endpoint_number, endpoint_value 3 from user_histograms where table_name='TEST1'; LPAD(TABLE LPAD( ENDPOINT_NUMBER ENDPOINT_VALUE ---------- ----- --------------- -------------- TEST1 C2 0 0 TEST1 C2 2 1 TEST1 C2 3 2 TEST1 C2 4 9 -- Buckets same as than # of distinct values, ENDPOINT_NUMBER refers to accumulated # of rows SQL> analyze table test1 compute statistics for columns c2 size 5; Table analyzed. -- Value-based histogram, or frequency histogram, created. Loook at ENDPOINT_VALUE -- First bucket, whose ENDPOINT_VALUE is 0, contains only 1 row. SQL> select lpad(TABLE_NAME,10), lpad(COLUMN_NAME, 5), 2 endpoint_number, endpoint_value 3 from user_histograms where table_name='TEST1'; LPAD(TABLE LPAD( ENDPOINT_NUMBER ENDPOINT_VALUE ---------- ----- --------------- -------------- TEST1 C2 1 0 TEST1 C2 17 1 TEST1 C2 21 2 TEST1 C2 22 7 TEST1 C2 23 9 -- Buckets larger than # of distinct values, ENDPOINT_NUMBER refers to accumulated # of rows -- Same as above. SQL> analyze table test1 compute statistics for columns c2 size 6; Table analyzed. SQL> select lpad(TABLE_NAME,10), lpad(COLUMN_NAME, 5), 2 endpoint_number, endpoint_value 3 from user_histograms where table_name='TEST1'; LPAD(TABLE LPAD( ENDPOINT_NUMBER ENDPOINT_VALUE ---------- ----- --------------- -------------- TEST1 C2 1 0 TEST1 C2 17 1 TEST1 C2 21 2 TEST1 C2 22 7 TEST1 C2 23 9 * What is selectivity? How do you calculate the selectivity? 1. Value-based histogram The selectivity of (c2=1) is (17-1)/23=16/23=0.6956 You say selectivity of c2=1 is high, or selectivity of c2=1 is poor, or c2=1 is not selective, Pretty confusing with the terms. 2. Height-balanced histogram Estimation is needed. Se Metalink 68992.1 * How does ORACLE estimate selectivity with and without histograms? 1. With value-based histogram Selectivity can be precisely determined as described above, for both equality and range predicate. 2. Height-balanced histogram The concept of popular and non-pipular values. 3. Without histogram, with table analyzed, the optimizer has info. about the followings in the dba_tab_col_statistics view SQL> select distinct c2 from t2; C2 ---------- 0 1 2 7 9 SQL> select NUM_DISTINCT,LOW_VALUE,HIGH_VALUE,NUM_NULLS,SAMPLE_SIZE,DENSITY from dba_tab_col_statist ics where table_name='T2' and owner='JYANG' and column_name='C2'; NUM_DISTINCT LOW_VALUE ------------ ---------------------------------------------------------------- HIGH_VALUE NUM_NULLS ---------------------------------------------------------------- ---------- SAMPLE_SIZE DENSITY ----------- ---------- 5 80 C10A 0 23 .2 SQL> Optimizer will assume uniform distribution of data among the '5' distinct values. So the for the selectivity of A. Equality predicate: 1/5 of 0.2, this is true regardless of the value used in the predicate - even if the value does not exist in the table. Because ORACLE does not know whether the data exist. So, C2=8 will get 0.2 selectivity. Also note that high_value, low_value have been hashed, it seems. B. Range predicate: * What happens when bind variables are used? In the above example, this calculation would not have been possible The selectivity of (c2=1) is (17-1)/23=16/23=0.6956 When it becomes (c1=:b1) So the optimizer can only assume uniform distrbution of the data and use average selectivity or default selectivity. Note that the "density' in dba_tab_col_statistics is the average selectivity for the column. For queries with range predicates using bind variables, we have no way of calculating the selectivity, so we use a hardcoded default value of 5% This is true irrespective of histograms as CBO does not know the value of the bind variable. Selectivity for bind variables with 'like' predicates defaults to 25% ------------------ So using literals without histograms is pretty much the same as using bind varables then? No, not really. Use literals, ORACLE can estimate the selectivity by assuming uniform distribution of data, for example, for range predicate c2>1 Is it (9-1)/10=80% (possible values are 0..9), or 1/(# of unique values)=1/5=20% ? Without knowing the value of the bind variable, one can not make such estimation. * Do statements using bind variables always use the same execution plan?