Instead of rewritten parse tree, will use the term query tree going forward. In order to view the below serialized DXL Representation (generate the minidump) of the query tree, do the following:
--- 1. Set the GUC which will serialize the CDXLNode's
set optimizer_minidump=always;
--- 2. Run the query which you need to dump to a file a.k.a minidump file
SELECT * FROM FOO INNER JOIN BAR on FOO.empName=BAR.empName where FOO.empName='Rambo';
--- 3. After the query is executed, go to the directory $COORDINATOR_DATA_DIRECTORY/minidumps or
--- $MASTER_DATA_DIRECTORY/minidumps. You should see the latest file of the name Minidump_XXXXXXXX_XXXXXX_XX_XX.mdp.
--- If you view the file using xmllint --format Minidump_XXXXXXXX_XXXXXX_XX_XX.mdp, it should show you the Serialized Query DXL Tree.
If the GUC optimize_minidump=always is not set, the Query Tree is converted to the CDXLNode Representation only. It’s not serialized to the file like below. The minidump file also contains the Serialized Plan and Serialized Metadata Sections which are not shown below.
DXL Representation of Query Tree
<dxl:Query>
<dxl:OutputColumns>
<dxl:Ident ColId="1" ColName="empname" TypeMdid="0.25.1.0"/>
<dxl:Ident ColId="2" ColName="empid" TypeMdid="0.23.1.0"/>
<dxl:Ident ColId="10" ColName="empname" TypeMdid="0.25.1.0"/>
<dxl:Ident ColId="11" ColName="emplocation" TypeMdid="0.25.1.0"/>
</dxl:OutputColumns>
<dxl:CTEList/>
<dxl:LogicalSelect>
<dxl:Comparison ComparisonOperator="=" OperatorMdid="0.98.1.0">
<dxl:Ident ColId="1" ColName="empname" TypeMdid="0.25.1.0"/>
<dxl:ConstValue TypeMdid="0.25.1.0" Value="AAAACVJhbWJv" LintValue="3894216960"/>
</dxl:Comparison>
<dxl:LogicalJoin JoinType="Inner">
<dxl:LogicalGet>
<dxl:TableDescriptor Mdid="0.124170.1.0" TableName="foo">
<dxl:Columns>
<dxl:Column ColId="1" Attno="1" ColName="empname" TypeMdid="0.25.1.0" ColWidth="8"/>
<dxl:Column ColId="2" Attno="2" ColName="empid" TypeMdid="0.23.1.0" ColWidth="4"/>
<dxl:Column ColId="3" Attno="-1" ColName="ctid" TypeMdid="0.27.1.0" ColWidth="6"/>
<dxl:Column ColId="4" Attno="-3" ColName="xmin" TypeMdid="0.28.1.0" ColWidth="4"/>
<dxl:Column ColId="5" Attno="-4" ColName="cmin" TypeMdid="0.29.1.0" ColWidth="4"/>
<dxl:Column ColId="6" Attno="-5" ColName="xmax" TypeMdid="0.28.1.0" ColWidth="4"/>
<dxl:Column ColId="7" Attno="-6" ColName="cmax" TypeMdid="0.29.1.0" ColWidth="4"/>
<dxl:Column ColId="8" Attno="-7" ColName="tableoid" TypeMdid="0.26.1.0" ColWidth="4"/>
<dxl:Column ColId="9" Attno="-8" ColName="gp_segment_id" TypeMdid="0.23.1.0" ColWidth="4"/>
</dxl:Columns>
</dxl:TableDescriptor>
</dxl:LogicalGet>
<dxl:LogicalGet>
<dxl:TableDescriptor Mdid="0.124176.1.0" TableName="bar">
<dxl:Columns>
<dxl:Column ColId="10" Attno="1" ColName="empname" TypeMdid="0.25.1.0" ColWidth="8"/>
<dxl:Column ColId="11" Attno="2" ColName="emplocation" TypeMdid="0.25.1.0" ColWidth="8"/>
<dxl:Column ColId="12" Attno="-1" ColName="ctid" TypeMdid="0.27.1.0" ColWidth="6"/>
<dxl:Column ColId="13" Attno="-3" ColName="xmin" TypeMdid="0.28.1.0" ColWidth="4"/>
<dxl:Column ColId="14" Attno="-4" ColName="cmin" TypeMdid="0.29.1.0" ColWidth="4"/>
<dxl:Column ColId="15" Attno="-5" ColName="xmax" TypeMdid="0.28.1.0" ColWidth="4"/>
<dxl:Column ColId="16" Attno="-6" ColName="cmax" TypeMdid="0.29.1.0" ColWidth="4"/>
<dxl:Column ColId="17" Attno="-7" ColName="tableoid" TypeMdid="0.26.1.0" ColWidth="4"/>
<dxl:Column ColId="18" Attno="-8" ColName="gp_segment_id" TypeMdid="0.23.1.0" ColWidth="4"/>
</dxl:Columns>
</dxl:TableDescriptor>
</dxl:LogicalGet>
<dxl:Comparison ComparisonOperator="=" OperatorMdid="0.98.1.0">
<dxl:Ident ColId="1" ColName="empname" TypeMdid="0.25.1.0"/>
<dxl:Ident ColId="10" ColName="empname" TypeMdid="0.25.1.0"/>
</dxl:Comparison>
</dxl:LogicalJoin>
</dxl:LogicalSelect>
</dxl:Query>
对于所考虑的查询,正如我们在分析树分析过程中所注意到的,Join Tree和Projection List是主要成员,它们掌握着rtable等成员所支持的查询的关键。在下面的DXL表示中,For the query in consideration, as we noted during Parse Tree analysis, Join Tree and the Projection List were the primary members holding the crux of the query supported by members like rtable. In the DXL representation below.
- The content between
<dxl:LogicalSelect>
representFOO INNER JOIN BAR on FOO.empName=BAR.empName where FOO.empName='Rambo';
. The LogicalSelect has primarily 2 childs:<dxl:Comparison>
element that representsFOO.empName='Rambo'
.<dxl:LogicalJoin JoinType="Inner">
element that represents the join condition and relation involved i.eFOO INNER JOIN BAR on FOO.empName=BAR.empName
【CLogicalJoin operator and its JoinType indicates that it’s a join INNER JOIN;The 1st child is a LogicalGet element that indicates its a scan from a table. i.e from FOO;The 2nd child is another LogicalGet element that indicates its a scan from a table. i.e from BAR;The 3rd child is the Comparision element that indicates the join condition. i.e. ON FOO.empName=BAR.empName;So, it’s equivalent to FOO INNER JOIN BAR on FOO.empName=BAR.empName】. - The dxl content between
<dxl:OutputColumns>
and</dxl:OutputColumns>
is the projection / target list corresponding toSELECT *
. The projection list contains 4 columns in query tree which is represented here in dxl representation. (FOO and BAR has 2 user columns each, so total 4). OutputColumns element is on top of the LogicalSelect and filters out only the required 4 columns as per the query. A table may have multiple columns and it has several system columns like ctid, xmin, xmax, cmax, tableoid and gp_segment_id which in this are not projected, so they should be excluded.
At a high level
为什么要查询DXL表示?其目的是允许ORCA的可插入接口,即如果提供了包含查询的序列化表示的xml文件(即小型转储文件),则不需要运行数据库来生成计划。它可以插入任何数据库,只要为数据库编写了转换函数(即查询树到dxl和dxl到计划语句)。Why Query DXL representation? The intent is to allow pluggable interface for ORCA,i.e. if a xml file (i.e. minidump file) containing the serialized representation of the query is provided, one does not need a running database to generate the plan. It could be plugged to any database provided the translating functions (i.e. query tree to dxl and dxl to planned statement) are written for the database.
Create an equivalent CDXLNode representation of the query tree Nodes bottom up. 【During “Query Tree to DXL Translation” the operators are represented using an CDXLNode class object. Refer to CDXLNode.cpp for available constructors.】 The Serialization helpers, ex: CDXLTableDescr.cpp to enable writing to the minidump file if optimizer_minidump=always. The Deserialization helpers, ex: as listed in CParseHandlerTableDescr.cpp to enable reading from the minidump file. The file CTranslatorQueryToDXL.cpp as the name suggest, it contains the driving code for query tree to dxl translation.
Query Tree To DXL translation is invoked via COptTasks::OptimizeTask by calling query_to_dxl_translator->TranslateQueryToDXL(); In COptTasks::OptimizeTask, prior to translating the query tree into dxl representation, the query tree is sent for Normalization while creating the object for query_to_dxl_translator. During normalization the following modification to the query tree are done: Flattening join alias vars, Converting Distinct clause to Grouping clause, Pulling up Having clause into a Select, Normalization of Group by and Window Operators project list 【Note: Refer to the function: CQueryMutators::NormalizeQuery.】
supported CmdType
The query could be of the types supported by typedef enum CmdType, ORCA supports the below 4 and fallbacks to planner for anything else. For the query in consideration the commandType is 1. i.e. CMD_SELECT
- Create a minidump file for the query:
SELECT * FROM FOO INNER JOIN BAR on FOO.empName=BAR.empName where FOO.empName='Rambo'
; and review the contents. Does the minidump file also contain the Plan and Metadata? Review the metadata section in the minidump file to understand the information captured. - Review the method CTranslatorScalarToDXL::CreateScalarCmpFromOpExpr to see how a GPDB OpExpr is converted to a CDXLNode representation
- Stop the database, and run the query using the minidump file generated in step 1)
cd ~/workspace/gpdb/src/backend/gporca
cmake -GNinja -D CMAKE_BUILD_TYPE=Debug -H. -Bbuild.debug
ninja -C build.debug/
cd build.debug
./server/gporca_test -d $COORDINATOR_DATA_DIRECTORY/minidumps/<minidump_file_name> -T 101001
Note: -T 101001 or EopttracePrintPlan prints the Physical Plan for the query in minidump. Refer to libnaucrates/include/naucrates/traceflags/traceflags.h for viewing the traceflags available.
- What are traceflags? Can you run the minidump again to print the Algebrized query too? (Hint: EopttracePrintQuery)
- Review COptimizer::PdxlnOptimize, if optimizer_minidump=always is set, where does the serialization to dxl occur? (Attach a breakpoint to view)