oracle sql优化-选择率
sql优化中的单表选择率(selectivity)
最近在看一些优化的书,所以把看书后的自己一些理解写下来,请大家批评指教。
数据库版本10.2.0.1
OS:windowsxp
以下没有考虑直方图的影响!!!!
在SQL的执行的过程中,某个操作所产生的预计行数,在选择初始连接顺序和最优索引选择方面起着重要作用。
所以很好的理解优化器如何估计计划中的每一不产生的行数将是非常有意义的,
现在拿书上的一个例子为大家解释什么是选择率
某个会议需要召集1200名听众,如何确定其中有多少人的生日是十二月呢?
如果大家自己估计的话?会有多少人呢?学过概率的人估计知道如何去算,今天咱们就用CBO来分析下,次处选择1200人只是为了好计算。
一年有12个月—已知条件
出生的人可能分布在12个月中—假设
出生在任何一个月份中的人占整个听众的1/12—-月的选择率,选择率的概念就此引出,书上是这样说的经过特定测试数据集中预期的行比率估计来计算基数(Cardinality),这
个行比率就是一个数值就是选择率
请求一个特定的月份(比如此处找12月份的)——–谓词(where 后的条件)
所请求的月份确实存在—-边界检查
听众总人数1200人——基本基数(在以后文章我会介绍什么是基数)
由以上我们的出出生12月份的人数应该为1200*1/12=100—-计算基数
基数和选择率什么关系呢?
他们之间的关系是:基数=输入的行数*选择率
从以上描述我们抽象为SQL
假设我们有一个audience表,表中有一个叫month_no的列,其中为1-12表示一年月的1-12月份,那我们以上的需要变为SQL语句即为:
select count(*) from audience where month_no=12;
我们用以下语句生成我们所需要的表:
execute dbms_random.seed(0);
drop table audience;
begin
begin execute immediate ‘purge recyclebin’;
exception when others then null;
end;
begin execute immediate ‘begin dbms_stats.delete_system_stats; end;’;
exception when others then null;
end;
begin execute immediate ‘alter session set “_optimizer_cost_model”=io’;
exception when others then null;
end;
end;
/
以上设置环境
create table audience as
select
trunc(dbms_random.value(1,13)) month_no
from
all_objects
where
rownum <= 1200
;
生成 audience表
统计表的信息
begin
dbms_stats.gather_table_stats(
user,
‘audience’,
cascade => true,
estimate_percent => null,
method_opt => ‘for all columns size 1′
);
end;
/
我们从数据库中查询下表audience相关信息:
SQL> select
2 column_name,
3 num_distinct,
4 num_nulls,
5 density,
6 value_to_number(low_value) low,
7 value_to_number(high_value) high
8 from
9 user_tab_columns
10 where table_name = ‘AUDIENCE’
11 and column_name = ‘MONTH_NO’
12 ;
COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY LOW HIGH
—————————— ———— ———- ———- ———- ———-
MONTH_NO 12 0 .083333333 1 12
select
column_name, endpoint_number, endpoint_value
from
user_tab_histograms
where
table_name = ‘AUDIENCE’
order by
column_name, endpoint_number
;
COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE
————— ————–
MONTH_NO 0 1
MONTH_NO 1 12
SQL> select num_rows from user_tables where table_name=’AUDIENCE’;
NUM_ROWS
———-
1200
通过查看user_tab_col_statistics 中的数字和检查视图user_tab_histograms可以发现如下详细信息:
1 user_tab_col_statistics.num_distinct等于12
2 user_tab_columns显示最大值为12,最小为1,因此可以假定这些值是均匀分布的
3 user_tab_col_statistics.density的值为1/12,每一个月对应整个数据的1/12
4 12位于user_tab_col_statistics 的low_value和high_value之间
5 user_tab_col_statistics.num_nulls,人不可能没有出生时间,此处为0符合
6 user_tables.num_rows为1200
7 1200的1/12为100,即得到答案
SQL> select count(*)
2 from audience
3 where month_no = 12
4 ;
COUNT(*)
———-
104
执行计划
———————————————————-
Plan hash value: 3337892515
—————————————————————
| Id | Operation | Name | Rows | Bytes | Cost |
—————————————————————
| 0 | SELECT STATEMENT | | 1 | 3 | 2 |
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | TABLE ACCESS FULL| AUDIENCE | 100 | 300 | 2 |
—————————————————————
Predicate Information (identified by operation id):
—————————————————
2 - filter(”MONTH_NO”=12)
Note
—–
- cpu costing is off (consider enabling it)
统计信息
———————————————————-
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
409 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
通过执行计划可以看出,rows=基数=100,实际生在12月的人数为104,和预计100接近。
我们在观察数据字典的信息时,发现
SQL> select column_name,num_distinct,density
2 from user_tab_col_statistics
3 where table_name=’AUDIENCE’;
COLUMN_NAME NUM_DISTINCT DENSITY
——————– ———— ———-
MONTH_NO 12 .083333333
SQL> select 1/12 from dual;
1/12
———-
.083333333
看起来oracle 同样的信息存储两次,为什么呢? num_distinct(完全不同的非空值)density(忽略了包含空值的行)
其实呢?上面只是个例子,也有不相同的时候,就是在某列上创建了直方图,density就不在等于1/num_distinct
在10G
如果存在直方图:那么优化器将使用density列值:既基数cardinality=num_rows*density(选择率)selectivity
如果不存在直方图:那么优化器将使用num_distinct既基数cardinality=num_rows/num_distinct num_distinct倒数即为选择率selectivity
这个大家可以自己严整下
下面继续讨论几种不同的情况:
空值:
假设听众中10%的人不记得自己的生日是在哪一个月,那么又有多少人认为自己是生在12月呢?
在原来的条件中
1 user_tab_col_statistics.num_distinct等于12
2 user_tab_columns显示最大值为12,最小为1,因此可以假定这些值是均匀分布的
3 user_tab_col_statistics.density的值为1/12,每一个月对应整个数据的1/12
4 12位于user_tab_col_statistics 的low_value和high_value之间
5 user_tab_col_statistics.num_nulls,人不可能没有出生时间,此处为0符合
6 user_tables.num_rows为1200
只有第5项数据发生变化了,不在为0。
对于人来说思考过程如下:
如果有100个人出生在12月(以上的结论)且又有10%的人无法记住自己的生日,同时假设这些人是均匀分布的,因此在这100个人中有100(1-10%)=90人会记住自己的生日:
对于优化器的思考过程如下:
基本选择率不变:还是1/12 为什么不变呢,因为统计信息的时候就已经忽略了空值
num_nulls=1200%10=120
num_rows=1200
调整后的选择率为:基本选择率*(num_rows-num_nulls)/num_rows=(1/12)*(1200-120)/1200=0.075
调整后的基数(也就是预期返回的行数)=调整后的几率*num_rows=0.075*1200=90,即预期有90个人会记住自己出生在12月
实际SQL:
测试环境如下:
execute dbms_random.seed(0);
drop table audience;
begin
begin execute immediate ‘purge recyclebin’;
exception when others then null;
end;
begin execute immediate ‘begin dbms_stats.delete_system_stats; end;’;
exception when others then null;
end;
begin execute immediate ‘alter session set “_optimizer_cost_model”=io’;
exception when others then null;
end;
end;
/
create table audience as
select
rownum id,
trunc(dbms_random.value(1,13)) month_no
from
all_objects
where
rownum <= 1200
;
update
audience
set month_no = null
where mod(id,10) = 0
;
commit;
begin
dbms_stats.gather_table_stats(
user,
‘audience’,
cascade => true,
estimate_percent => null,
method_opt => ‘for all columns size 1′
);
end;
/
SQL> select
2 column_name,
3 num_distinct,
4 num_nulls,
5 density,
6 value_to_number(low_value) low,
7 value_to_number(high_value) high
8 from
9 user_tab_columns
10 where table_name = ‘AUDIENCE’
11 ;
COLUMN_NAME NUM_DISTINCT NUM_NULLS DENSITY LOW HIGH
——————– ———— ———- ———- ———- ———-
ID 1200 0 .000833333 1 1200
MONTH_NO 12 120 .083333333 1 12
select
column_name, endpoint_number, endpoint_value
from
user_tab_histograms
where
table_name = ‘AUDIENCE’
order by
column_name, endpoint_number
;
COLUMN_NAME,ENDPOINT_NUMBER,ENDPOINT_VALUE
ID,0,1
ID,1,1200
MONTH_NO,0,1
MONTH_NO,1,12
看执行计划和结果:
SQL> select count(*)
2 from audience
3 where month_no = 12
4 ;
COUNT(*)
———-
96
执行计划
———————————————————-
Plan hash value: 3337892515
—————————————————————
| Id | Operation | Name | Rows | Bytes | Cost |
—————————————————————
| 0 | SELECT STATEMENT | | 1 | 3 | 2 |
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | TABLE ACCESS FULL| AUDIENCE | 90 | 270 | 2 |
—————————————————————
Predicate Information (identified by operation id):
—————————————————
2 - filter(”MONTH_NO”=12)
Note
—–
- cpu costing is off (consider enabling it)
统计信息
———————————————————-
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
408 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
执行计划中的CARD=ROWS=基数=90和我们分析的一样,实际为96
继续讨论:使用列表时,选择率的计算
以上我们知道了如何处理column=constant的情况,比如包含列表的查询、包含空值列表的查询、包含两列的查询、指定范围的查询和棒定变量的查询。
首先从最简单的情况in-list开始
比如我们随便统计3个月(6,7,8)出生的人数
SQL select count(*) from audience where month_no in (6,7,8)
首先环境的搭建:
drop table audience;
begin
begin execute immediate ‘purge recyclebin’;
exception when others then null;
end;
begin execute immediate ‘begin dbms_stats.delete_system_stats; end;’;
exception when others then null;
end;
begin execute immediate ‘alter session set “_optimizer_cost_model”=io’;
exception when others then null;
end;
end;
/
create table audience as
select
trunc(dbms_random.value(1,13)) month_no
from
all_objects
where
rownum <= 1200
;
begin
dbms_stats.gather_table_stats(
user,
‘audience’,
cascade=>true,
estimate_percent => null,
method_opt =>’for all columns size 1′
);
end;
/
测试用例:
set autotrace traceonly explain
spool in_list
select count(*) from audience where month_no in (1,2);
select count(*) from audience where month_no in (1,2,3);
select count(*) from audience where month_no in (1,2,3,4);
select count(*) from audience where month_no in (1,2,3,4,5);
select count(*) from audience where month_no in (1,2,3,4,5,6);
select count(*) from audience where month_no in (1,2,3,4,5,6,7);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11,12);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11,12,13);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11,12,13,14);
select count(*) from audience where month_no in (
1, 2, 3, 4, 5, 6, 7, 8, 9,10,
11,12,13,14,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,30
);
SQL> select count(*) from audience where month_no in (1,2);
执行计划
———————————————————-
Plan hash value: 3337892515
—————————————————————
| Id | Operation | Name | Rows | Bytes | Cost |
—————————————————————
| 0 | SELECT STATEMENT | | 1 | 3 | 2 |
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | TABLE ACCESS FULL| AUDIENCE | 200 | 600 | 2 |
—————————————————————
Predicate Information (identified by operation id):
—————————————————
2 - filter(”MONTH_NO”=1 OR “MONTH_NO”=2)
Note
—–
- cpu costing is off (consider enabling it)
统计信息
———————————————————-
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
409 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(*) from audience where month_no in (1,2,3);
执行计划
———————————————————-
Plan hash value: 3337892515
—————————————————————
| Id | Operation | Name | Rows | Bytes | Cost |
—————————————————————
| 0 | SELECT STATEMENT | | 1 | 3 | 2 |
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | TABLE ACCESS FULL| AUDIENCE | 300 | 900 | 2 |
—————————————————————
Predicate Information (identified by operation id):
—————————————————
2 - filter(”MONTH_NO”=1 OR “MONTH_NO”=2 OR “MONTH_NO”=3)
Note
—–
- cpu costing is off (consider enabling it)
统计信息
———————————————————-
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
409 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
————————————–
SQL> select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11,12);
执行计划
———————————————————-
Plan hash value: 3337892515
—————————————————————
| Id | Operation | Name | Rows | Bytes | Cost |
—————————————————————
| 0 | SELECT STATEMENT | | 1 | 3 | 2 |
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | TABLE ACCESS FULL| AUDIENCE | 1200 | 3600 | 2 |
—————————————————————
Predicate Information (identified by operation id):
—————————————————
2 - filter(”MONTH_NO”=1 OR “MONTH_NO”=2 OR “MONTH_NO”=3 OR
”MONTH_NO”=4 OR “MONTH_NO”=5 OR “MONTH_NO”=6 OR “MONTH_NO”=7 OR
”MONTH_NO”=8 OR “MONTH_NO”=9 OR “MONTH_NO”=10 OR “MONTH_NO”=11 OR
”MONTH_NO”=12)
Note
—–
- cpu costing is off (consider enabling it)
统计信息
———————————————————-
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
408 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11,12,13,14);
执行计划
———————————————————-
Plan hash value: 3337892515
—————————————————————
| Id | Operation | Name | Rows | Bytes | Cost |
—————————————————————
| 0 | SELECT STATEMENT | | 1 | 3 | 2 |
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | TABLE ACCESS FULL| AUDIENCE | 1200 | 3600 | 2 |
—————————————————————
Predicate Information (identified by operation id):
—————————————————
2 - filter(”MONTH_NO”=1 OR “MONTH_NO”=2 OR “MONTH_NO”=3 OR
”MONTH_NO”=4 OR “MONTH_NO”=5 OR “MONTH_NO”=6 OR “MONTH_NO”=7 OR
”MONTH_NO”=8 OR “MONTH_NO”=9 OR “MONTH_NO”=10 OR “MONTH_NO”=11 OR
”MONTH_NO”=12 OR “MONTH_NO”=13 OR “MONTH_NO”=14)
Note
—–
- cpu costing is off (consider enabling it)
统计信息
———————————————————-
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
408 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(*) from audience where month_no in (
2 1, 2, 3, 4, 5, 6, 7, 8, 9,10,
3 11,12,13,14,15,16,17,18,19,20,
4 21,22,23,24,25,26,27,28,29,30
5 );
执行计划
———————————————————-
Plan hash value: 3337892515
—————————————————————
| Id | Operation | Name | Rows | Bytes | Cost |
—————————————————————
| 0 | SELECT STATEMENT | | 1 | 3 | 2 |
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | TABLE ACCESS FULL| AUDIENCE | 1200 | 3600 | 2 |
—————————————————————
Predicate Information (identified by operation id):
—————————————————
2 - filter(”MONTH_NO”=1 OR “MONTH_NO”=2 OR “MONTH_NO”=3 OR
”MONTH_NO”=4 OR “MONTH_NO”=5 OR “MONTH_NO”=6 OR “MONTH_NO”=7 OR
”MONTH_NO”=8 OR “MONTH_NO”=9 OR “MONTH_NO”=10 OR “MONTH_NO”=11 OR
”MONTH_NO”=12 OR “MONTH_NO”=13 OR “MONTH_NO”=14 OR “MONTH_NO”=15 OR
”MONTH_NO”=16 OR “MONTH_NO”=17 OR “MONTH_NO”=18 OR “MONTH_NO”=19 OR
”MONTH_NO”=20 OR “MONTH_NO”=21 OR “MONTH_NO”=22 OR “MONTH_NO”=23 OR
”MONTH_NO”=24 OR “MONTH_NO”=25 OR “MONTH_NO”=26 OR “MONTH_NO”=27 OR
”MONTH_NO”=28 OR “MONTH_NO”=29 OR “MONTH_NO”=30)
Note
—–
- cpu costing is off (consider enabling it)
统计信息
———————————————————-
1 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
408 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
中间几个查询就不一一做了,
会的出如下一个结论:
列表大小 基数–12 基数1000
1 100 12
2 200 24
3 300 36
4 400 48
5
6
7
8
9 900 108
10 1000
11
12 1200 144
13 1200 156
14 1200 168
30 1200 360
当有12个值的时候两个结果的区别比有1000个值时的区别更为明显,在10G发现在列条目没有超过不同值的数目前:基数=N*列表中的条数目
有1000个不同值的脚本如下:有兴趣的可以测试下:
drop table audience;
begin
begin execute immediate ‘purge recyclebin’;
exception when others then null;
end;
begin execute immediate ‘begin dbms_stats.delete_system_stats; end;’;
exception when others then null;
end;
begin execute immediate ‘alter session set “_optimizer_cost_model”=io’;
exception when others then null;
end;
end;
/
/*
drop table generator;
create table generator as
select
rownum id
from all_objects
where rownum <= 1000
;
*/
create table audience as
with generator as (
select –+ materialize
rownum id
from all_objects
where rownum <= 5000
)
select
trunc(dbms_random.value(1,1001)) month_no
from
generator v1,
generator v2
where
rownum <= 12000
;
begin
dbms_stats.gather_table_stats(
user,
‘audience’,
cascade=>true,
estimate_percent => null,
method_opt =>’for all columns size 1′
);
end;
/
set autotrace traceonly explain
spool in_list_02
select count(*) from audience where month_no in (1,2);
select count(*) from audience where month_no in (1,2,3);
select count(*) from audience where month_no in (1,2,3,4);
select count(*) from audience where month_no in (1,2,3,4,5);
select count(*) from audience where month_no in (1,2,3,4,5,6);
select count(*) from audience where month_no in (1,2,3,4,5,6,7);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11,12);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11,12,13);
select count(*) from audience where month_no in (1,2,3,4,5,6,7,8,9,10,11,12,13,14);
select count(*) from audience where month_no in (
1, 2, 3, 4, 5, 6, 7, 8, 9,10,
11,12,13,14,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,30
);
set autotrace off
spool off
接下来我们看以12个不同值的表为例,我们看一些奇怪现象:
where month_no = 25 —– 超过high_value的情况
where month_no in (4, 4) —-重复的值
where month_no in (3, 25) —输入输出 的混合集
where month_no in (3, 25, 26) —同上
where month_no in (3, 25, 25, 26)–同上,但包含重复值
where month_no in (3, 25, null) —优化器能否检测出NULL
where month_no in (:b1, :b2, :b3) —包含还是不包含绑定变量
搭建环境脚本:
execute dbms_random.seed(0);
drop table audience;
begin
begin execute immediate ‘purge recyclebin’;
exception when others then null;
end;
begin execute immediate ‘begin dbms_stats.delete_system_stats; end;’;
exception when others then null;
end;
begin execute immediate ‘alter session set “_optimizer_cost_model”=io’;
exception when others then null;
end;
end;
/
create table audience as
select
trunc(dbms_random.value(1,13)) month_no
from
all_objects
where
rownum <= 1200
;
begin
dbms_stats.gather_table_stats(
user,
‘audience’,
cascade => true,
estimate_percent => null,
method_opt => ‘for all columns size 1′
);
end;
/
rem
rem A little function to make is possible to call
rem the conversion routines in dbms_stats from an
rem SQL statement
rem
create or replace function value_to_number(i_raw in raw)
return number deterministic as
m_n number(6);
begin
dbms_stats.convert_raw_value(i_raw,m_n);
return m_n;
end;
.
/
variable b1 number;
variable b2 number;
variable b3 number;
set autotrace traceonly explain
spool oddities
select count(*) from audience
where month_no = 25
;
select count(*) from audience
where month_no in (4, 4)
;
select count(*) from audience
where month_no in (3, 25)
;
select count(*) from audience
where month_no in (3, 25, 26)
;
select count(*) from audience
where month_no in (3, 25, 25, 26)
;
select count(*) from audience
where month_no in (3, 25, null)
;
select count(*) from audience
where month_no in (:b1, :b2, :b3)
;
set autotrace off
spool off
结论是:
谓词 基数
where month_no = 25 —– 超过high_value的情况 好 1,应该是10.2中改进的,10.1中还没,幸好测试了下
where month_no in (4, 4) —-重复的值 好 100
where month_no in (3, 25) —输入输出 的混合集 好, 100
where month_no in (3, 25, 26) —同上 好 101
where month_no in (3, 25, 25, 26)–同上,但包含重复值 好, 101
where month_no in (3, 25, null) —优化器能否检测出NULL 好 200 应该是10.2中改进的,10.1中还没,幸好测试了下
where month_no in (:b1, :b2, :b3) —包含还是不包含绑定变量 不好,但一致 300
幸好,挨个测试了下,10。2中改进了许多
双谓词讨论:
比如说如下 查询
select count(*) from audienct
where month_no>8 or month_no<=8
优化器将会给出多大基数呢?
我们知道month_no >8基数=436 month_no<=8 基数为864
有如下三种猜测:
1
基数=1300,谓词明显没有重叠区域,因此将对应的基数相加
2基数=1200,联合谓词必须操作表中所有的行,而表有1200行,
3 基数=986
SQL> select count(*)
2 from audience
3 where month_no > 8
4 or month_no <= 8
5 ;
执行计划
———————————————————-
Plan hash value: 3337892515
—————————————————————
| Id | Operation | Name | Rows | Bytes | Cost |
—————————————————————
| 0 | SELECT STATEMENT | | 1 | 3 | 2 |
| 1 | SORT AGGREGATE | | 1 | 3 | |
|* 2 | TABLE ACCESS FULL| AUDIENCE | 986 | 2958 | 2 |
—————————————————————
Predicate Information (identified by operation id):
—————————————————
2 - filter(”MONTH_NO”>8 OR “MONTH_NO”<=8)
Note
—–
- cpu costing is off (consider enabling it)
统计信息
———————————————————-
173 recursive calls
0 db block gets
21 consistent gets
0 physical reads
0 redo size
408 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
1 rows processed
大家可以试试将8换成1-12中的某一个,算下基数
所以三种通用的三种联合谓词基本公式:
1 谓词1 and 谓词2的选择率=谓词1*谓词2的选择率
2 谓词1 or 谓词2的选择率=谓词1选择率+(谓词2的选择率-(谓词1and谓词2的选择率)) 减掉的部分是两次会重叠的部分
3 not 谓词的选择率=1-谓词1的选择率
继续讨论:区间谓词
讨论区间谓词时,我们就要讨论很多情况了,有边界,无边界,开区间,闭区间等以上4种的组合
测试用例:
- 谓词 基数:card=rows 10GR2 基数9i/10GR1
- month_no>8 436 无边界,开区间 436
- month_no>=8 536 无边界,闭区间 536
- month_no<8 764 无边界,开区间 764
- month_no<=8 864 无边界,闭区间 864
- month_no between 6 and 9 527 有边界,闭区间,闭区间 527
- month_no >=6 and month_no<=9 527 有边界,闭区间,闭区间 7
- month_no >=6 and month_no<9 427 有边界,闭区间,开区间 427
- month_no >6 and month_no<=9 427 有边界,开区间,闭区间 427
- month_no >6 and month_no<9 327 有边界,开区间,开区间 327
- month_no>:b1 60 无边界,开区间 60
- month_no>=:b1 60 无边界,闭区间 60
- month_no<:b1 60 无边界,开区间 60
- month_no<=b1 60 无边界,闭区间 60
- month_no between :b1 and :b2 3 有边界,闭区间,闭区间 3
- month_no >=:b1 and month_no<=:b2 3 有边界,闭区间,闭区间 3
- month_no >=:b1 and month_no<:b2 3 有边界,闭区间,开区间 3
- month_no >:b1 and month_no<=:b2 3 有边界,开区间,闭区间 3
- month_no >:b1 and month_no<:b2 3 有边界,开区间,开区间 3
- month_no >12 100 无边界,开区间 100
- month_no between 25 and 30 1 有边界,闭区间,闭区间 100
总结:从上边测试结果可以看出9I、10GR1和10GR2结果基本一样,就测试6和20差别大,在计算基数时,9I10G都使用四舍五入
结论1:当使用字面值作为边界时,开区间(>和<)和闭区间(>=和<=) 之间结果差异小于或者等于100
结论2:当区间超过列的正常值时,在10GR1 9I时都为100,在10GR2时,对于超过正常值的,有边界的基数为1
结论3:有边界的区间返回的值看起来是严格固定的,几乎不用处理不确定性
结论4:示例10-13(不具有边界区间的绑定变量):优化器只是简单的将选择率置为5%(1200行(不包含空值) 基数=ROWS=1200*5%=60
结论5:示例14-18(具有边界区间的绑定变量):优化器只是简单的将选择率置为2.5%(1200行(不包含空值) 基数=ROWS=1200*2.5%=3
结论6:示例19-20(区间记录超过列值高低值时)在10GR2中,对于19,优化器检测到超过已知区间后给出一个选择率,基数依据column=constant情况进行修正
对于20在10GR2中已经能够很好的处理了。
对于常值的谓词,近似的优化器算法如下:
选择率=“所需空间”除以“总可用空间”
总区间大小=user_tab_col_statistics.high_value - user_tab_col_statistics.low_value
优化器对数目娇小的不同值所执行的任何操作都将存在内置的关键性错误,就象30-40之间有11个数,可是40-30=10
对于示例1:month_no>8
选择率=(high_value-limit)/(high_value-low_value)=(12-8)/(12-1)=4/11
基数=1200*4/11=436.3636,上面说过在9I以上执行的四舍五入即为436,和优化器给出的一致
示例2:month_no>=8
因为包含8,所以算法必须调整:
选择率=(high_value-limit)/(high_value-low_value)+1/num_distinct=4/11+1/12
基数=1200*(4/11+1/12)=536.3636即为536
示例3和4
示例3选择率=(limit-low_value)/(high_value-low_value)=(8-1)/(12-1)=7/11
示例4选择率=(limit-low_value)/(high_value-low_value)+1/num_distinct=(8-1)/(12-1)+1/12
示例5和6,两个谓词实际是等价的
示例5和6选择率=(9-6)(12-1)+1/12+1/12
示例7 8 9
7选择率=(9-6)/(12-1)+1/12
8选择率=(9-6)/(12-1)+1/12
9选择率=(9-6)/(12-1)
测试环境搭建的脚本:
execute dbms_random.seed(0);
drop table audience;
begin
begin execute immediate ‘purge recyclebin’;
exception when others then null;
end;
begin execute immediate ‘begin dbms_stats.delete_system_stats; end;’;
exception when others then null;
end;
begin execute immediate ‘alter session set “_optimizer_cost_model”=io’;
exception when others then null;
end;
end;
/
create table audience as
select
trunc(dbms_random.value(1,13)) month_no
from
all_objects
where
rownum <= 1200
;
begin
dbms_stats.gather_table_stats(
user,
‘audience’,
cascade => true,
estimate_percent => null,
method_opt => ‘for all columns size 1′
);
end;
/
variable b1 number;
variable b2 number;
spool ranges
set autotrace traceonly explain
select count(*)
from audience
where month_no > 8
;
select count(*)
from audience
where month_no >= 8
;
select count(*)
from audience
where month_no < 8
;
select count(*)
from audience
where month_no <= 8
;
select count(*)
from audience
where month_no between 6 and 9
;
select count(*)
from audience
where month_no >= 6 and month_no <= 9
;
select count(*)
from audience
where month_no >= 6 and month_no < 9
;
select count(*)
from audience
where month_no > 6 and month_no <= 9
;
select count(*)
from audience
where month_no > 6 and month_no < 9
;
select count(*)
from audience
where month_no > :b1
;
select count(*)
from audience
where month_no >= :b1
;
select count(*)
from audience
where month_no < :b1
;
select count(*)
from audience
where month_no <= :b1
;
select count(*)
from audience
where month_no between :b1 and :b2
;
select count(*)
from audience
where month_no >= :b1 and month_no <= :b2
;
select count(*)
from audience
where month_no >= :b1 and month_no < :b2
;
select count(*)
from audience
where month_no > :b1 and month_no <= :b2
;
select count(*)
from audience
where month_no > :b1 and month_no < :b2
;
select count(*)
from audience
where month_no > 12
;
select count(*)
from audience
where month_no between 25 and 30
;
rem
rem And now for the funny one
rem
select count(*)
from audience
where month_no > 8
or month_no <= 8
;
set autotrace off
spool off
