Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.8.0
-
CentOS release 6.8 (Final)
Description
StmtRewrite lost parentheses of CompoundPredicate in pushNegationToOperands(), as a result, getExplainString(conjuncts_) can't get the correct string.
For example, we have a table defined by:
create table if not exists item ( i_item_sk integer , i_item_id varchar(16) , i_rec_start_date timestamp , i_rec_end_date timestamp , i_item_desc varchar(200) , i_current_price decimal(7,2) , i_wholesale_cost decimal(7,2) , i_brand_id integer , i_brand varchar(50) , i_class_id integer , i_class varchar(50) , i_category_id integer , i_category varchar(50) , i_manufact_id integer , i_manufact varchar(50) , i_size varchar(20) , i_formulation varchar(20) , i_color varchar(20) , i_units varchar(10) , i_container varchar(10) , i_manager_id integer , i_product_name varchar(50) ) row format delimited fields terminated by '|' escaped by '\\' lines terminated by '\n';
And if we run query:
select distinct(i_product_name) from item i1 where i_manufact_id between 738 and 738+40 and ( select count(*) as item_cnt from item where i_manufact = i1.i_manufact and ( ( (i_category = 'Women' and (i_color = 'powder' or i_color = 'khaki') and (i_units = 'Ounce' or i_units = 'Oz') and (i_size = 'medium' or i_size = 'extra large') ) or (i_category = 'Women' and (i_color = 'brown' or i_color = 'honeydew') and (i_units = 'Bunch' or i_units = 'Ton') and (i_size = 'N/A' or i_size = 'small') ) or (i_category = 'Men' and (i_color = 'floral' or i_color = 'deep') and (i_units = 'N/A' or i_units = 'Dozen') and (i_size = 'petite' or i_size = 'large') ) or (i_category = 'Men' and (i_color = 'light' or i_color = 'cornflower') and (i_units = 'Box' or i_units = 'Pound') and (i_size = 'medium' or i_size = 'extra large') ) or (i_category = 'Women' and (i_color = 'midnight' or i_color = 'snow') and (i_units = 'Pallet' or i_units = 'Gross') and (i_size = 'medium' or i_size = 'extra large') ) or (i_category = 'Women' and (i_color = 'cyan' or i_color = 'papaya') and (i_units = 'Cup' or i_units = 'Dram') and (i_size = 'N/A' or i_size = 'small') ) or (i_category = 'Men' and (i_color = 'orange' or i_color = 'frosted') and (i_units = 'Each' or i_units = 'Tbl') and (i_size = 'petite' or i_size = 'large') ) or (i_category = 'Men' and (i_color = 'forest' or i_color = 'ghost') and (i_units = 'Lb' or i_units = 'Bundle') and (i_size = 'medium' or i_size = 'extra large') ) ) ) ) > 0 order by i_product_name limit 100;
The Text Plan in webpage of this query is as followed:
---------------- Estimated Per-Host Requirements: Memory=2.28GB VCores=2 WARNING: The following tables are missing relevant table and/or column statistics. tpcds1g.item 11:MERGING-EXCHANGE [UNPARTITIONED] | order by: (i_product_name) ASC | limit: 100 | hosts=1 per-host-mem=unavailable | tuple-ids=5 row-size=16B cardinality=100 | 05:TOP-N [LIMIT=100] | order by: (i_product_name) ASC | hosts=1 per-host-mem=1.56KB | tuple-ids=5 row-size=16B cardinality=100 | 10:AGGREGATE [FINALIZE] | group by: (i_product_name) | hosts=1 per-host-mem=128.00MB | tuple-ids=4 row-size=16B cardinality=unavailable | 09:EXCHANGE [HASH((i_product_name))] | hosts=1 per-host-mem=0B | tuple-ids=4 row-size=16B cardinality=unavailable | 04:AGGREGATE [STREAMING] | group by: (i_product_name) | hosts=1 per-host-mem=128.00MB | tuple-ids=4 row-size=16B cardinality=unavailable | 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | hash predicates: i1.i_manufact = i_manufact | other predicates: zeroifnull(count(*)) > 0 | hosts=1 per-host-mem=2.00GB | tuple-ids=0,2N row-size=60B cardinality=unavailable | |--08:EXCHANGE [BROADCAST] | | hosts=1 per-host-mem=0B | | tuple-ids=2 row-size=24B cardinality=unavailable | | | 07:AGGREGATE [FINALIZE] | | output: count:merge(*) | | group by: i_manufact | | having: zeroifnull(count(*)) > 0 | | hosts=1 per-host-mem=128.00MB | | tuple-ids=2 row-size=24B cardinality=unavailable | | | 06:EXCHANGE [HASH(i_manufact)] | | hosts=1 per-host-mem=0B | | tuple-ids=2 row-size=24B cardinality=unavailable | | | 02:AGGREGATE [STREAMING] | | output: count(*) | | group by: i_manufact | | hosts=1 per-host-mem=128.00MB | | tuple-ids=2 row-size=24B cardinality=unavailable | | | 01:SCAN HDFS [tpcds1g.item, RANDOM] | partitions=1/1 files=1 size=4.80MB | predicates: i_category = 'Women' AND i_color = 'powder' OR i_color = 'khaki' AND i_units = 'Ounce' OR i_units = 'Oz' AND i_size = 'medium' OR i_size = 'extra large' OR i_category = 'Women' AND i_color = 'brown' OR i_color = 'honeydew' AND i_units = 'Bunch' OR i_units = 'Ton' AND i_size = 'N/A' OR i_size = 'small' OR i_category = 'Men' AND i_color = 'floral' OR i_color = 'deep' AND i_units = 'N/A' OR i_units = 'Dozen' AND i_size = 'petite' OR i_size = 'large' OR i_category = 'Men' AND i_color = 'light' OR i_color = 'cornflower' AND i_units = 'Box' OR i_units = 'Pound' AND i_size = 'medium' OR i_size = 'extra large' OR i_category = 'Women' AND i_color = 'midnight' OR i_color = 'snow' AND i_units = 'Pallet' OR i_units = 'Gross' AND i_size = 'medium' OR i_size = 'extra large' OR i_category = 'Women' AND i_color = 'cyan' OR i_color = 'papaya' AND i_units = 'Cup' OR i_units = 'Dram' AND i_size = 'N/A' OR i_size = 'small' OR i_category = 'Men' AND i_color = 'orange' OR i_color = 'frosted' AND i_units = 'Each' OR i_units = 'Tbl' AND i_size = 'petite' OR i_size = 'large' OR i_category = 'Men' AND i_color = 'forest' OR i_color = 'ghost' AND i_units = 'Lb' OR i_units = 'Bundle' AND i_size = 'medium' OR i_size = 'extra large' | table stats: unavailable | column stats: unavailable | hosts=1 per-host-mem=32.00MB | tuple-ids=1 row-size=80B cardinality=unavailable | 00:SCAN HDFS [tpcds1g.item i1, RANDOM] partitions=1/1 files=1 size=4.80MB predicates: i_manufact_id >= 738, i_manufact_id <= 738 + 40 table stats: unavailable column stats: unavailable hosts=1 per-host-mem=32.00MB tuple-ids=0 row-size=36B cardinality=unavailable ----------------
we can see that predicates of 01:SCAN HDFS [tpcds1g.item, RANDOM] lost all parentheses. Even the query result is correct, it makes me confuse while reading this Text Plan. The proper Text Plan should be:
---------------- Estimated Per-Host Requirements: Memory=2.28GB VCores=2 WARNING: The following tables are missing relevant table and/or column statistics. tpcds1g.item 11:MERGING-EXCHANGE [UNPARTITIONED] | order by: (i_product_name) ASC | limit: 100 | hosts=1 per-host-mem=unavailable | tuple-ids=5 row-size=16B cardinality=100 | 05:TOP-N [LIMIT=100] | order by: (i_product_name) ASC | hosts=1 per-host-mem=1.56KB | tuple-ids=5 row-size=16B cardinality=100 | 10:AGGREGATE [FINALIZE] | group by: (i_product_name) | hosts=1 per-host-mem=128.00MB | tuple-ids=4 row-size=16B cardinality=unavailable | 09:EXCHANGE [HASH((i_product_name))] | hosts=1 per-host-mem=0B | tuple-ids=4 row-size=16B cardinality=unavailable | 04:AGGREGATE [STREAMING] | group by: (i_product_name) | hosts=1 per-host-mem=128.00MB | tuple-ids=4 row-size=16B cardinality=unavailable | 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | hash predicates: i1.i_manufact = i_manufact | other predicates: zeroifnull(count(*)) > 0 | hosts=1 per-host-mem=2.00GB | tuple-ids=0,2N row-size=60B cardinality=unavailable | |--08:EXCHANGE [BROADCAST] | | hosts=1 per-host-mem=0B | | tuple-ids=2 row-size=24B cardinality=unavailable | | | 07:AGGREGATE [FINALIZE] | | output: count:merge(*) | | group by: i_manufact | | having: zeroifnull(count(*)) > 0 | | hosts=1 per-host-mem=128.00MB | | tuple-ids=2 row-size=24B cardinality=unavailable | | | 06:EXCHANGE [HASH(i_manufact)] | | hosts=1 per-host-mem=0B | | tuple-ids=2 row-size=24B cardinality=unavailable | | | 02:AGGREGATE [STREAMING] | | output: count(*) | | group by: i_manufact | | hosts=1 per-host-mem=128.00MB | | tuple-ids=2 row-size=24B cardinality=unavailable | | | 01:SCAN HDFS [tpcds1g.item, RANDOM] | partitions=1/1 files=1 size=4.80MB | predicates: ((i_category = 'Women' AND (i_color = 'powder' OR i_color = 'khaki') AND (i_units = 'Ounce' OR i_units = 'Oz') AND (i_size = 'medium' OR i_size = 'extra large')) OR (i_category = 'Women' AND (i_color = 'brown' OR i_color = 'honeydew') AND (i_units = 'Bunch' OR i_units = 'Ton') AND (i_size = 'N/A' OR i_size = 'small')) OR (i_category = 'Men' AND (i_color = 'floral' OR i_color = 'deep') AND (i_units = 'N/A' OR i_units = 'Dozen') AND (i_size = 'petite' OR i_size = 'large')) OR (i_category = 'Men' AND (i_color = 'light' OR i_color = 'cornflower') AND (i_units = 'Box' OR i_units = 'Pound') AND (i_size = 'medium' OR i_size = 'extra large')) OR (i_category = 'Women' AND (i_color = 'midnight' OR i_color = 'snow') AND (i_units = 'Pallet' OR i_units = 'Gross') AND (i_size = 'medium' OR i_size = 'extra large')) OR (i_category = 'Women' AND (i_color = 'cyan' OR i_color = 'papaya') AND (i_units = 'Cup' OR i_units = 'Dram') AND (i_size = 'N/A' OR i_size = 'small')) OR (i_category = 'Men' AND (i_color = 'orange' OR i_color = 'frosted') AND (i_units = 'Each' OR i_units = 'Tbl') AND (i_size = 'petite' OR i_size = 'large')) OR (i_category = 'Men' AND (i_color = 'forest' OR i_color = 'ghost') AND (i_units = 'Lb' OR i_units = 'Bundle') AND (i_size = 'medium' OR i_size = 'extra large'))) | table stats: unavailable | column stats: unavailable | hosts=1 per-host-mem=32.00MB | tuple-ids=1 row-size=80B cardinality=unavailable | 00:SCAN HDFS [tpcds1g.item i1, RANDOM] partitions=1/1 files=1 size=4.80MB predicates: i_manufact_id >= 738, i_manufact_id <= 738 + 40 table stats: unavailable column stats: unavailable hosts=1 per-host-mem=32.00MB tuple-ids=0 row-size=36B cardinality=unavailable ----------------