请教11G的新特性嵌套循环，谢谢

显示全部楼层 · 2013-2-25 14:51:24

如下两种执行计划，11G的那个到底优越在那里？
物理I/O减少了？怎么减少的呢？谢谢
SELECT e.first_name, e.last_name, e.salary, d.department_name
FROM hr.employees e, hr.departments d
WHERE d.department_name IN ('Marketing', 'Sales')
AND e.department_id = d.department_id;
Original Implementation for Nested Loop Joins
Consider the following query:
SELECT e.first_name, e.last_name, e.salary, d.department_name
FROM hr.employees e, hr.departments d
WHERE d.department_name IN ('Marketing', 'Sales')
AND e.department_id = d.department_id;
Before Oracle Database 11g, the execution plan for this query might appear similar to the following execution plan:
-------------------------------------------------------------------------------------------------
| Id| Operation
| Name
| Rows| Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT
|
|19 | 722 | 3(0)| 00:00:01 |
| 1 |TABLE ACCESS BY INDEX ROWID| EMPLOYEES |10 | 220 | 1(0)| 00:00:01 |
| 2 | NESTED LOOPS
|
|19 | 722 | 3(0)| 00:00:01 |
|*3 |TABLE ACCESS FULL| DEPARTMENTS | 2 |32 | 2(0)| 00:00:01 |
|*4 |INDEX RANGE SCAN | EMP_DEPARTMENT_IX |10 | | 0(0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("D"."DEPARTMENT_NAME"='Marketing' OR "D"."DEPARTMENT_NAME"='Sales')
4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
In this example, the outer side of the join consists of a scan of the hr.departments table that returns the rows that match the condition department_name IN ('Marketing', 'Sales'). The inner loop retrieves the employees in the hr.employees table that are associated with those departments.
11.3.3.1.2 New Implementation for Nested Loop Joins
Oracle Database 11g introduces a new implementation for nested loop joins to reduce overall latency for physical I/O. When an index or a table block is not in the buffer cache and is needed to process the join, a physical I/O is required. Oracle Database 11g can batch multiple physical I/O requests and process them using a vector I/O instead of processing them one at a time.
As part of the new implementation for nested loop joins, two NESTED LOOPS join row sources might appear in the execution plan where only one would have appeared in prior releases. In such cases, Oracle Database allocates one NESTED LOOPS join row source to join the values from the table on the outer side of the join with the index on the inner side. A second row source is allocated to join the result of the first join, which includes the rowids stored in the index, with the table on the inner side of the join.
Consider the query in "Original Implementation for Nested Loop Joins". In Oracle Database 11g, with the new implementation for nested loop joins, the execution plan for this query might appear similar to the following execution plan:
------------------------------------------------------------------------------------------------
| Id| Operation
| Name
| Rows| Bytes | Cost(%CPU)| Time|
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT
|
|19 | 722 | 3 (0)| 00:00:01 |
| 1 |NESTED LOOPS
|
| | |
|
|
| 2 | NESTED LOOPS
|
|19 | 722 | 3 (0)| 00:00:01 |
|*3 |TABLE ACCESS FULL | DEPARTMENTS | 2 |32 | 2 (0)| 00:00:01 |
|*4 |INDEX RANGE SCAN
| EMP_DEPARTMENT_IX |10 | | 0 (0)| 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES |10 | 220 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("D"."DEPARTMENT_NAME"='Marketing' OR "D"."DEPARTMENT_NAME"='Sales')
4 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
In this case, the rows from the hr.departments table constitute the outer side of the first join. The inner side of the first join is the index emp_department_ix. The results of the first join constitute the outer side of the second join, which has the hr.employees table as its inner side.
There are cases where a second join row source is not allocated, and the execution plan looks the same as it did in prior releases. The following list describes such cases:
?All of the columns needed from the inner side of the join are present in the index, and there is no table access required. In this case, Oracle Database allocates only one join row source.
?The order of the rows returned might be different than it was in previous releases. Hence, when Oracle Database tries to preserve a specific ordering of the rows, for example to eliminate the need for an ORDER BY sort, Oracle Database might use the original implementation for nested loop joins.
?The OPTIMIZER_FEATURES_ENABLE initialization parameter is set to a release before Oracle Database 11g. In this case, Oracle Database uses the original implementation for nested loop joins.

千问 · 2013-2-25 14:51:24

关注

千问 · 2013-2-25 14:51:24

You're quoting documentation:
http://docs.oracle.com/cd/E11882 ... e16638/optimops.htm
I think what the new implementation does is to batch the first join result. If I understand the plans right, pre-11g, it sends one row at a time all the way from the departments, through the index, and to the employees. In 11g, it accumulates after it finishes the first join. Then the accumulated join result is treated like a temporary table and one "row" at a time from it is taken out to join to the last table, employees. Because of the accumulation, batch processing is possible. That's my personal understanding. Could be wrong.
I'm not sure how to test this nested loop batching. First reproduce the two plans on 10g and 11g respectively. Then flush buffer cache. Run the SQL in both DBs and see what events they waited on. Maybe the 11g session waited on db file scattered read more than the 10g session. If you don't have two versions of databases, test it in 11g with and without "_nlj_batching_enabled" set to 0.

千问 · 2013-2-25 14:51:24

千问 · 2013-2-25 14:51:24

本帖最后由 Yong Huang 于 2013-3-8 12:11 编辑
> 对于10G来说，我在《基于成本的Oracle优化法则》那本书上看到过对于如下执行的解释
> 认为3和4的关联结果，是得到全部的ROWID,然后遍历这个中间结果(全部的ROWID) 然后再TABLE ACCESS BY INDEX ROWID
> 这一点11G的两个NESTED LOOP没有任何区别
> 好像并非你说的PRE-11G是找到一个ROWID就去访问表,所以就有疑问了，到底区别在那里？
I vaguely remember he said that too. Can you tell me the page number in the book? If you use a Chinese edition, also please tell me the section and approximate location in the section. I only have his English edition.
I tried to figure this out by the following test:
select /*+ monitor use_nl(t) use_nl(t2) */ * from t, t2 where t.x = t2.x;
set long 20000 longc 200
select dbms_sqltune.report_sql_monitor('') from dual;
The Execs and Rows(Actual) columns may be able to tell us. But the 10g case (with "_nlj_batching_enabled" set to 0) is hard to interpret.
> 除非是11G的TABLE ACCESS BY INDEX ROWID是一次读多个块的
That's possible. So, possibly more db file scattered read.
I read a couple of online documents about this NLJ batching (vs 10g prefetching). None directly answers your question. Maybe you can open an SR with Oracle support. http://oracle-randolf.blogspot.c ... ion-part-3-11g.html says the 11g batching saves on CPU usage (if no I/O is involved). He didn't test when I/O has to occur.

千问 · 2013-2-25 14:51:24

本帖最后由 earth2mars 于 2013-2-21 14:52 编辑
Yong Huang 发表于 2013-2-21 14:29
> 对于10G来说，我在《基于成本的Oracle优化法则》那本书上看到过对于如下执行的解释
> 认为3和4的关联结 ...
page 285 Chapter 11.1 Chinese edition

千问 · 2013-2-25 14:51:24

本帖最后由 Yong Huang 于 2013-2-22 23:11 编辑
You're right about the point that both 10g and 11g plans accumulate the interim result before joining to the last (called inner) table. Let me explain with the SQL monitor output. Test was done in 11.2.0.3.
SQL> select dbms_sqltune.report_sql_monitor('1k7bc5xfx5p3p') from dual;
...
SQL Text
------------------------------
select /*+ monitor use_nl(t) use_nl(t2) */ * from t, t2 where t.x = t2.x
...
Duration
:.005688s
...
Global Stats
=================================================
| Elapsed | Cpu |Other | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |Gets|
=================================================
|0.01 |0.00 | 0.00 | 2 | 12 |
=================================================
SQL Plan Monitoring Details (Plan Hash Value=3129773264)
====================================================================================================================================
| Id | Operation
| Name |Rows | Cost | Time| Start| Execs | Rows | Activity | Activity Detail |
||
|| (Estim) || Active(s) | Active | | (Actual) | (%)| (# samples) |
====================================================================================================================================
|0 | SELECT STATEMENT
|| || 1 | +0 | 1 |5 |
|
|
|1 | NESTED LOOPS
|| || 1 | +0 | 1 |5 |
|
|
|2 |NESTED LOOPS
|| 5 |7 | 1 | +0 | 1 |5 |
|
|
|3 | TABLE ACCESS FULL
| T2 | 5 |2 | 1 | +0 | 1 |5 |
|
|
|4 | INDEX RANGE SCAN | T_I| 1 || 1 | +0 | 5 |5 |
|
|
|5 |TABLE ACCESS BY INDEX ROWID | T| 1 |1 | 1 | +0 | 5 |5 |
|
|
====================================================================================================================================
SQL> select dbms_sqltune.report_sql_monitor('1k7bc5xfx5p3p') from dual;
...
Duration
:.001s
...
Global Stats
======================================
| Elapsed | Cpu | Fetch | Buffer |
| Time(s) | Time(s) | Calls |Gets|
======================================
|0.00 |0.00 | 2 | 12 |
======================================
SQL Plan Monitoring Details (Plan Hash Value=2232545417)
===================================================================================================================================
| Id | Operation | Name |Rows | Cost | Time| Start| Execs | Rows | Activity | Activity Detail |
||
|| (Estim) || Active(s) | Active | | (Actual) | (%)| (# samples) |
===================================================================================================================================
|0 | SELECT STATEMENT
|| || 1 | +0 | 1 |5 |
|
|
|1 | TABLE ACCESS BY INDEX ROWID | T| 1 |1 | 1 | +0 | 1 |5 |
|
|
|2 |NESTED LOOPS
|| 5 |7 | 1 | +0 | 1 | 11 |
|
|
|3 | TABLE ACCESS FULL | T2 | 5 |2 | 1 | +0 | 1 |5 |
|
|
|4 | INDEX RANGE SCAN
| T_I| 1 || 1 | +0 | 5 |5 |
|
|
===================================================================================================================================
Foscus on the Execs and Rows(Actual) columns. The bottom plan is for 10g (or in my case, 11g with _nlj_batching_enabled=0). First, T2 is read, only once (but 5 rows in total). For each row, it's used to probe index T_I, 5 times in total but 1 row every time so 5 rows. According to Jonathan Lewis in his book, the plan really should have another interim step after this (i.e. between lines with ID's 1 and 2), maybe called SORT. That sort accumulates the result of the nested loop. Because of the sorting, the sorted row source can be used to probe table T in order of the table blocks, therefore T only needs to be scanned once (hence Execs=1).
Now the upper plan, for 11g. Interpretation of lines 3, 4, and 2 is the same. (Goes through T2 once and reads 5 rows in total (line 3). For each row, it probes index T_I). Again, there may be an interim SORT after 2 but before 5. Now the real difference! Result in line 2 (or rather, result of the missing SORT of line 2) is truly used as the driving "table" of the NL join of line 1, in the sense that each row of the line 2 result is taken out to join to T (line 5). So line 5 has Execs=5 instead of 1. (Compare to line 1 of the 10g plan which has Execs of 1.)
(Note: This statement may be wrong according to the discussion posted later: So the real difference, based on this reading of the SQL monitor output, is about which source acts as the driver in the last step. 10g uses the inner table T (line 1 of the 10g plan), while 11g uses the first NL join result of the interim sorted result, line 2 of the 11g plan.)
How does the 11g plan save on CPU and/or buffer gets or physical I/O's? Unfortunately, this interpretation does not answer. I'll have to think about it.

千问 · 2013-2-25 14:51:24

本帖最后由 earth2mars 于 2013-2-22 22:39 编辑
Yong Huang 发表于 2013-2-22 02:24
You're right about the point that both 10g and 11g plans accumulate the interim result before joinin ...
多谢
1、你这个只是2个执行计划，如何证明10G和11G一样也是有一个中间结果(含有全部的ROWID)？
2、按照你说的，10G 的最后一步是T作为驱动表，我觉得不是这样的，如果按照你说的
那么就要遍历T表，然后每遍历一行或者N行，这个遍历完全可能是全表扫描，然后到排过序的那个中间结果查询。
如果T很大，这个过程会很慢，我觉得不可能会这样做，10G也一定是中间结果做为驱动行源的。而T表才是INNER 表。
11G的这种做法，可以理解，我在9i，10G的时候，都人为的先取得ROWID,然后排序，然后再拿这个ROWID去查询表
效果不错

千问 · 2013-2-25 14:51:24

另外，我看了你给的链接，仔细看了几遍，他其实是介绍了一种写sql的技巧，和我的问题没啥关系
Second Enhancement 很难理解，没明白这句话到底啥意思
benefits from “scalar subquery caching”, which means that under perfect conditions
we will run the subquery just once per product sold in the date range, rather than once per row.

千问 · 2013-2-25 14:51:24

本帖最后由 Yong Huang 于 2013-2-22 11:52 编辑
> 如何证明10G和11G一样也是有一个中间结果(含有全部的ROWID)？
I don't have a proof. I basically took Jonathan Lewis's words for it. But an explanation with an interim row source matches the numbers in the SQL monitor output.
> 按照你说的，10G 的最后一步是T作为驱动表，我觉得不是这样的，如果按照你说的
> 那么就要遍历T表，然后每遍历一行或者N行，这个遍历完全可能是全表扫描，然后到排过序的那个中间结果查询。
> 如果T很大，这个过程会很慢，我觉得不可能会这样做，10G也一定是中间结果做为驱动行源的。而T表才是INNER 表
Read Jonathan Lewis's explanation of Fig.11-2 in his book:
"The new mechanism ... When all the target rowids have been found, the engine can sort them and then visit the inner table in a single pass, working along the length of the table just once, picking the rows in whatever order they happen to appear—in this case (a, b, b, a, a, c, b, c, a, c)."
If "中间结果做为驱动行源的", then T has to be 遍历 more than once. Isn't that going to have a much higher cost than 遍历 once?
Please use the numbers in the SQL monitor output to back up your claim. Normally you could also enable _rowsource_statistics or use gather_plan_statistics hint to get per-row runtime stats. But this 11g new feature of NLJ will misbehave using that method (according to many sources including the article I quoted). So SQL monitor is a good tool, maybe the only tool.