FarragoExplainPlanExplained
The EXPLAIN PLAN extension statement provides a way to preview the plan generated by the optimizer to evaluate a query without actually executing the query. This can be useful for query tuning, optimizer debugging, and learning more about how Farrago processes queries.
Contents |
Syntax:
EXPLAIN PLAN
[ { EXCLUDING | INCLUDING [ ALL ] } ATTRIBUTES ]
[ { WITH | WITHOUT } IMPLEMENTATION | WITH TYPE ]
[ AS XML ]
FOR query-or-DML-statement
Validation Rules:
- query-or-DML-statement must itself be valid
Results:
The default mode is WITH IMPLEMENTATION, indicating that the optimizer should produce a final physical plan. If WITHOUT IMPLEMENTATION is specified, EXPLAIN PLAN stops after translating the SQL statement into logical relational algebra and returns a pure algebraic expression. The explanation is returned as a normal result set with a single VARCHAR column; each row represents a plan node of some kind.
In the WITH TYPE mode, EXPLAIN PLAN prints the names and datatypes of the result columns (including nullability).
If the AS XML option is specified, then the plan structure is deeper (nodes may represent individual attributes of plan operators). If AS XML is not specified, then each line represents a physical execution stream node (for WITH IMPLEMENTATION) or relational algebra operator (for WITHOUT IMPLEMENTATION). See FarragoExplainPlanXml for more information.
The default detail level is INCLUDING ATTRIBUTES. EXCLUDING ATTRIBUTES displays only operator names, while INCLUDING ALL ATTRIBUTES adds additional attributes such as cost.
Examples
Let's start by seeing how SQL gets converted into relational algebra. If you're using sqlline, you may want to maximize your shell window because lines of the plan can be very long. (Eventually we'll have a nice graphical rendering based on the XML option, but for now it's just text text and more text).
0: jdbc:farrago:> !set outputformat csv 0: jdbc:farrago:> 0: jdbc:farrago:> explain plan without implementation for . . . . . . . . > select depts.name as dname,emps.name as ename . . . . . . . . > from sales.emps inner join sales.depts . . . . . . . . > on emps.deptno=depts.deptno . . . . . . . . > order by 1,2; 'column0' 'SortRel(sort0=[$0], sort1=[$1])' ' ProjectRel(DNAME=[$11], ENAME=[$1])' ' JoinRel(condition=[=($2, $10)], joinType=[inner])' ' TableAccessRel(table=[[LOCALDB, SALES, EMPS]])' ' TableAccessRel(table=[[LOCALDB, SALES, DEPTS]])'
So far this is fairly easy to read. The query tree structure is rendered via indentation level, so the two instances of TableAccessRel are the leaves feeding into the JoinRel. The attributes of the join are the join condition and type (inner, left outer, etc). The join condition is a bit confusing; the dollar-sign expressions refer to 0-based column ordinals within the input to the join (corresponding to the DEPTNO column on either side), after concatenating the tuple from the left side of the join with the right side. So, $2 refers to the third column in EMPS, which is DEPTNO. There are 10 columns in EMPS, so $10 refers to the first column in DEPTS. Below are the original table definitions from farrago/initsql/createSalesSchema.sql (we'll see some of the indexes defined here showing up later on):
create table depts( deptno integer not null primary key, name varchar(128) not null constraint depts_unique_name unique); create table emps( empno integer not null, name varchar(128) not null, deptno integer not null, gender char(1) default 'M', city varchar(128), empid integer not null unique, age integer, public_key varbinary(50), slacker boolean, manager boolean not null, primary key(deptno,empno)) create index emps_ux on emps(name);
The same dollar-sign convention is used throughout the plan; as an exercise, you might want to verify that the rest of the plan is correct.
Now, let's tell the optimizer to go all the way and produce a physical plan:
0: jdbc:farrago:> explain plan for
. . . . . . . . > select depts.name as dname,emps.name as ename
. . . . . . . . > from sales.emps inner join sales.depts
. . . . . . . . > on emps.deptno=depts.deptno
. . . . . . . . > order by 1,2;
'column0'
'FennelToIteratorConverter'
' FennelSortRel(key=[[0, 1]], discardDuplicates=[false])'
' FennelCalcRel(expr#0..11=[{inputs}], DNAME=[$t11], ENAME=[$t1])'
' FtrsIndexSearchRel(table=[[LOCALDB, SALES, DEPTS]], projection=[*], index=[SYS$CONSTRAINT_INDEX$DEPTS$SYS$PRIMARY_KEY], uniqueKey=[true], ...)'
' FtrsIndexScanRel(table=[[LOCALDB, SALES, EMPS]], projection=[*], index=[SYS$CONSTRAINT_INDEX$EMPS$SYS$PRIMARY_KEY], preserveOrder=[false])'
Now the plan is no longer a tree; instead, it's a linear chain of operators. That's because the optimizer found a plan involving an index join. Reading up from the bottom, the lowest level scans EMPS, and feeds each tuple into an index search on DEPTS. Matches will append the joined attributes; mismatches will be filtered out. After that, a calculator projects out only the desired attributes, which are fed into the sorter. (Those $t variables in the calculator program will be explained later on.) Finally, the output of the sorter is passed through a converter from Fennel tuples to a Java iterator so that the result set can be returned via JDBC. This plan is sub-optimal, because we don't yet have an optimizer rule to push projections down into the join.
You can see that there are a lot of additional physical attributes, such as which indexes are used, whether unique-key search optimizations are warranted, etc. (Some of them were replaced with ellipses to keep this wiki page from getting too wide.) If you don't want to be bothered with all that, you can get just the bare bones:
0: jdbc:farrago:> explain plan excluding attributes for . . . . . . . . > select depts.name as dname,emps.name as ename . . . . . . . . > from sales.emps inner join sales.depts . . . . . . . . > on emps.deptno=depts.deptno . . . . . . . . > order by 1,2; 'column0' 'FennelToIteratorConverter' ' FennelSortRel' ' FennelCalcRel' ' FtrsIndexSearchRel' ' FtrsIndexScanRel'
And if you want even more information such as cost, you can get that too:
0: jdbc:farrago:> explain plan including all attributes for
. . . . . . . . > select depts.name as dname,emps.name as ename
. . . . . . . . > from sales.emps inner join sales.depts
. . . . . . . . > on emps.deptno=depts.deptno
. . . . . . . . > order by 1,2;
'column0'
'FennelToIteratorConverter: cost = {100.0 rows, 100.0 cpu, 0.0 io}'
' FennelSortRel(key=[[0, 1]], discardDuplicates=[false]): cost = {100.0 rows, 460.51701859880916 cpu, 100.0 io}'
' FennelCalcRel(expr#0..11=[{inputs}], DNAME=[$t11], ENAME=[$t1]): cost = {100.0 rows, 2800.0 cpu, 0.0 io}'
' FtrsIndexSearchRel(table=[[LOCALDB, SALES, DEPTS]], projection=[*], ...): cost = {100.0 rows, 200.0 cpu, 200.0 io}'
' FtrsIndexScanRel(table=[[LOCALDB, SALES, EMPS]], projection=[*], ...): cost = {1000.0 rows, 10000.0 cpu, 10000.0 io}'
NOTE: the cost estimates are way off because we don't have ANALYZE TABLE yet; this is just to give you the idea.
Now, let's revisit CalcRel by issuing a new query:
0: jdbc:farrago:> explain plan for
. . . . . . . . > select 2*(deptno+1), upper(name)
. . . . . . . . > from sales.depts
. . . . . . . . > where deptno+1 > 5;
'column0'
'IterCalcRel(expr#0..1=[{inputs}],
expr#2=[2], expr#3=[1], expr#4=[+($t0, $t3)], expr#5=[*($t2, $t4)], expr#6=[UPPER($t1)], expr#7=[5], expr#8=[>($t4, $t7)],
EXPR$0=[$t5], EXPR$1=[$t6],
$condition=[$t8])'
' FennelToIteratorConverter'
' FtrsIndexScanRel(table=[[LOCALDB, SALES, DEPTS]], projection=[*], index=[SYS$CONSTRAINT_INDEX$DEPTS$SYS$PRIMARY_KEY], preserveOrder=[false])'
This time, for its own mysterious reasons, the optimizer chose to convert from Fennel to Java early and use a Java iterator implementation of the calculator. The same program structure plan rendering is used for both calculators (almost); let's take a closer look. Every calculator program has four parts:
- input tuple (expr#0..1 in the plan)
- common sub-expressions (expr#2 through expr#8 in the plan)
- output tuple (EXPR$0 and EXPR$1 in the plan). For the Fennel calculator, these show up as $f variables instead. FIXME jvs 28-Jan-2006: use the same convention, it's confusing enough already!
- optional condition ($condition in the plan)
The processing semantics are:
- Evaluate common sub-expressions in terms of the current input
- Evaluate the condition
- If the condition passes, evaluate the output expressions and produce an output tuple
(Calculator implementations may reorder operations to optimize processing as long as these semantics are preserved. TBD: expressions with side-effects; lazy evaluation guarantees.)
Within a program, a $t expression referes to the earlier expression with the corresponding number. So in expr#6=[UPPER($t1)], the $t1 refers to expr#1, which is the second input column (NAME). You can see common-subexpression elimination in action for the expression DEPTNO+1; this is computed only once by expr#4, and reused in both expr#5 (referenced from output EXPR$0) and expr#8 (referenced from $condition).
Reading these programs can be hard work!
Finally, let's cover a couple of other plan nodes that require special interpretation.
Index scans:
0: jdbc:farrago:> explain plan for
. . . . . . . . > select name from sales.depts where deptno > 20;
'column0'
'IterCalcRel(expr#0..1=[{inputs}], NAME=[$t1])'
' FennelToIteratorConverter'
' FtrsIndexSearchRel(table=[[LOCALDB, SALES, DEPTS]], projection=[*], index=[SYS$CONSTRAINT_INDEX$DEPTS$SYS$PRIMARY_KEY],
uniqueKey=[false], preserveOrder=[false], outer=[false], inputKeyProj=[[1, 3]], inputJoinProj=[[]], inputDirectiveProj=[[0, 2]])'
' FennelCalcRel(expr#0=[{inputs}], expr#1=['('], expr#2=[20], expr#3=['+'], expr#4=[null],
expr#5=[CAST($t2):INTEGER], expr#6=[CAST($t4):INTEGER], $f0=[$t1], $f1=[$t5], $f2=[$t3], $f3=[$t6])'
' FennelOneRowRel'
The way this plan works is that the two lowest nodes work together to produce a search key and associated directives. This is fed into the index search, which produces matching tuples to be projected by the topmost node. Since there's only one interval to search over, there's a single OneRowRel at the bottom. The calculator above it produces the search key and directives. Here's how to interpret expr#1 through expr$4:
- expr#1=['('] is a directive defining the lower bound of the search. Directives use mathematical interval notation, so round parentheses represent open bounds (in this case, strictly greater than) and square brackets represent closed bounds.
- expr#2=[20] is the search key coordinate associated with the lower bound
- expr#3=['+'] represents positive infinity for the upper bound; if the WHERE clause also had AND DEPTNO < 30, this would be ')' instead
- expr#4=[null] represents the fact that infinity has no associated coordinate at all
Aggregates:
0: jdbc:farrago:> explain plan without implementation for . . . . . . . . > select deptno, sum(age), count(*) . . . . . . . . > from sales.emps . . . . . . . . > group by deptno; 'column0' 'ProjectRel(DEPTNO=[$0], EXPR$1=[$1], EXPR$2=[$2])' ' AggregateRel(groupCount=[1], agg#0=[SUM($1)], agg#1=[COUNT()])' ' ProjectRel($0=[$2], $1=[$6])' ' TableAccessRel(table=[[LOCALDB, SALES, EMPS]])'
This query can be expressed in English as for each department, calculate the sum of the ages of the employees as well as the number of employees in that department. The lower ProjectRel picks out DEPTNO and AGE from the EMPS table. These are fed into the AggregateRel, which has the following semantics:
- The groupCount attribute specifies how many attributes from the input tuple are to be grouped on. These GROUP BY attributes must come first in the input tuple, so the groupCount just specifies the prefix size. Besides controlling aggregation, these attributes are also passed through into the output in the same positions.
- Zero or more agg# expressions define the rest of the AggregateRel output tuple (appended after the grouping attributes). In this example, there are two of these, so since groupCount=1, the output tuple has three attributes. (The topmost project is redundant except for giving DEPTNO its name back.)
- Aggregate expressions refer to inputs by 0-based ordinal. So SUM($1) means sum over the second input column (AGE). COUNT(*) does not actually depend on any value from the input, so the aggregate expression is just COUNT().
Now wasn't that fun?
EXPLAIN PLAN WITH TYPE
The WITH TYPE option just prints the names and types (including nullability) of the columns returned by a query. For example,
0: jdbc:farrago:> explain plan with type for . . . . . . . . > select * from sales.emps; 'column0' 'EMPNO INTEGER NOT NULL,' 'NAME VARCHAR(128) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary" NOT NULL,' 'DEPTNO INTEGER NOT NULL,' 'GENDER CHAR(1) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary",' 'CITY VARCHAR(128) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary",' 'EMPID INTEGER NOT NULL,' 'AGE INTEGER,' 'PUBLIC_KEY VARBINARY(50),' 'SLACKER BOOLEAN,' 'MANAGER BOOLEAN NOT NULL'