Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4325

StmtRewrite lost parentheses of CompoundPredicate in pushNegationToOperands()

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Frontend
    • Labels:
    • Environment:
      CentOS release 6.8 (Final)

      Description

      StmtRewrite lost parentheses of CompoundPredicate in pushNegationToOperands(), as a result, getExplainString(conjuncts_) can't get the correct string.

      For example, we have a table defined by:

      create table if not exists  item
      (
      i_item_sk                 integer               ,
      i_item_id                 varchar(16)              ,
      i_rec_start_date          timestamp                          ,
      i_rec_end_date            timestamp                          ,
      i_item_desc               varchar(200)                  ,
      i_current_price           decimal(7,2)                  ,
      i_wholesale_cost          decimal(7,2)                  ,
      i_brand_id                integer                       ,
      i_brand                   varchar(50)                      ,
      i_class_id                integer                       ,
      i_class                   varchar(50)                      ,
      i_category_id             integer                       ,
      i_category                varchar(50)                      ,
      i_manufact_id             integer                       ,
      i_manufact                varchar(50)                      ,
      i_size                    varchar(20)                      ,
      i_formulation             varchar(20)                      ,
      i_color                   varchar(20)                      ,
      i_units                   varchar(10)                      ,
      i_container               varchar(10)                      ,
      i_manager_id              integer                       ,
      i_product_name            varchar(50)
      ) row format delimited fields terminated by '|' escaped by '\\' lines terminated by '\n';
      

      And if we run query:

      select  distinct(i_product_name) from item i1 where i_manufact_id between 738 and 738+40
      and (
        select count(*) as item_cnt from item where i_manufact = i1.i_manufact and
        (
          (
            (i_category = 'Women' and
              (i_color = 'powder' or i_color = 'khaki') and
              (i_units = 'Ounce' or i_units = 'Oz') and
              (i_size = 'medium' or i_size = 'extra large')
            )
            or
            (i_category = 'Women' and
              (i_color = 'brown' or i_color = 'honeydew') and
              (i_units = 'Bunch' or i_units = 'Ton') and
              (i_size = 'N/A' or i_size = 'small')
            )
            or
            (i_category = 'Men' and
              (i_color = 'floral' or i_color = 'deep') and
              (i_units = 'N/A' or i_units = 'Dozen') and
              (i_size = 'petite' or i_size = 'large')
            )
            or
            (i_category = 'Men' and
              (i_color = 'light' or i_color = 'cornflower') and
              (i_units = 'Box' or i_units = 'Pound') and
              (i_size = 'medium' or i_size = 'extra large')
            )
            or
            (i_category = 'Women' and
              (i_color = 'midnight' or i_color = 'snow') and
              (i_units = 'Pallet' or i_units = 'Gross') and
              (i_size = 'medium' or i_size = 'extra large')
            )
            or
            (i_category = 'Women' and
              (i_color = 'cyan' or i_color = 'papaya') and
              (i_units = 'Cup' or i_units = 'Dram') and
              (i_size = 'N/A' or i_size = 'small')
            )
            or
            (i_category = 'Men' and
              (i_color = 'orange' or i_color = 'frosted') and
              (i_units = 'Each' or i_units = 'Tbl') and
              (i_size = 'petite' or i_size = 'large')
            )
            or
            (i_category = 'Men' and
              (i_color = 'forest' or i_color = 'ghost') and
              (i_units = 'Lb' or i_units = 'Bundle') and
              (i_size = 'medium' or i_size = 'extra large')
            )
          )
        )
      ) > 0
      order by i_product_name
      limit 100;
      

      The Text Plan in webpage of this query is as followed:

      ----------------
      Estimated Per-Host Requirements: Memory=2.28GB VCores=2
      WARNING: The following tables are missing relevant table and/or column statistics.
      tpcds1g.item
      
      11:MERGING-EXCHANGE [UNPARTITIONED]
      |  order by: (i_product_name) ASC
      |  limit: 100
      |  hosts=1 per-host-mem=unavailable
      |  tuple-ids=5 row-size=16B cardinality=100
      |
      05:TOP-N [LIMIT=100]
      |  order by: (i_product_name) ASC
      |  hosts=1 per-host-mem=1.56KB
      |  tuple-ids=5 row-size=16B cardinality=100
      |
      10:AGGREGATE [FINALIZE]
      |  group by: (i_product_name)
      |  hosts=1 per-host-mem=128.00MB
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      09:EXCHANGE [HASH((i_product_name))]
      |  hosts=1 per-host-mem=0B
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      04:AGGREGATE [STREAMING]
      |  group by: (i_product_name)
      |  hosts=1 per-host-mem=128.00MB
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
      |  hash predicates: i1.i_manufact = i_manufact
      |  other predicates: zeroifnull(count(*)) > 0
      |  hosts=1 per-host-mem=2.00GB
      |  tuple-ids=0,2N row-size=60B cardinality=unavailable
      |
      |--08:EXCHANGE [BROADCAST]
      |  |  hosts=1 per-host-mem=0B
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  07:AGGREGATE [FINALIZE]
      |  |  output: count:merge(*)
      |  |  group by: i_manufact
      |  |  having: zeroifnull(count(*)) > 0
      |  |  hosts=1 per-host-mem=128.00MB
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  06:EXCHANGE [HASH(i_manufact)]
      |  |  hosts=1 per-host-mem=0B
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  02:AGGREGATE [STREAMING]
      |  |  output: count(*)
      |  |  group by: i_manufact
      |  |  hosts=1 per-host-mem=128.00MB
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  01:SCAN HDFS [tpcds1g.item, RANDOM]
      |     partitions=1/1 files=1 size=4.80MB
      |     predicates: i_category = 'Women' AND i_color = 'powder' OR i_color = 'khaki' AND i_units = 'Ounce' OR i_units = 'Oz' AND i_size = 'medium' OR i_size = 'extra large' OR i_category = 'Women' AND i_color = 'brown' OR i_color = 'honeydew' AND i_units = 'Bunch' OR i_units = 'Ton' AND i_size = 'N/A' OR i_size = 'small' OR i_category = 'Men' AND i_color = 'floral' OR i_color = 'deep' AND i_units = 'N/A' OR i_units = 'Dozen' AND i_size = 'petite' OR i_size = 'large' OR i_category = 'Men' AND i_color = 'light' OR i_color = 'cornflower' AND i_units = 'Box' OR i_units = 'Pound' AND i_size = 'medium' OR i_size = 'extra large' OR i_category = 'Women' AND i_color = 'midnight' OR i_color = 'snow' AND i_units = 'Pallet' OR i_units = 'Gross' AND i_size = 'medium' OR i_size = 'extra large' OR i_category = 'Women' AND i_color = 'cyan' OR i_color = 'papaya' AND i_units = 'Cup' OR i_units = 'Dram' AND i_size = 'N/A' OR i_size = 'small' OR i_category = 'Men' AND i_color = 'orange' OR i_color = 'frosted' AND i_units = 'Each' OR i_units = 'Tbl' AND i_size = 'petite' OR i_size = 'large' OR i_category = 'Men' AND i_color = 'forest' OR i_color = 'ghost' AND i_units = 'Lb' OR i_units = 'Bundle' AND i_size = 'medium' OR i_size = 'extra large'
      |     table stats: unavailable
      |     column stats: unavailable
      |     hosts=1 per-host-mem=32.00MB
      |     tuple-ids=1 row-size=80B cardinality=unavailable
      |
      00:SCAN HDFS [tpcds1g.item i1, RANDOM]
         partitions=1/1 files=1 size=4.80MB
         predicates: i_manufact_id >= 738, i_manufact_id <= 738 + 40
         table stats: unavailable
         column stats: unavailable
         hosts=1 per-host-mem=32.00MB
         tuple-ids=0 row-size=36B cardinality=unavailable
      ----------------
      

      we can see that predicates of 01:SCAN HDFS [tpcds1g.item, RANDOM] lost all parentheses. Even the query result is correct, it makes me confuse while reading this Text Plan. The proper Text Plan should be:

      ----------------
      Estimated Per-Host Requirements: Memory=2.28GB VCores=2
      WARNING: The following tables are missing relevant table and/or column statistics.
      tpcds1g.item
      
      11:MERGING-EXCHANGE [UNPARTITIONED]
      |  order by: (i_product_name) ASC
      |  limit: 100
      |  hosts=1 per-host-mem=unavailable
      |  tuple-ids=5 row-size=16B cardinality=100
      |
      05:TOP-N [LIMIT=100]
      |  order by: (i_product_name) ASC
      |  hosts=1 per-host-mem=1.56KB
      |  tuple-ids=5 row-size=16B cardinality=100
      |
      10:AGGREGATE [FINALIZE]
      |  group by: (i_product_name)
      |  hosts=1 per-host-mem=128.00MB
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      09:EXCHANGE [HASH((i_product_name))]
      |  hosts=1 per-host-mem=0B
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      04:AGGREGATE [STREAMING]
      |  group by: (i_product_name)
      |  hosts=1 per-host-mem=128.00MB
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
      |  hash predicates: i1.i_manufact = i_manufact
      |  other predicates: zeroifnull(count(*)) > 0
      |  hosts=1 per-host-mem=2.00GB
      |  tuple-ids=0,2N row-size=60B cardinality=unavailable
      |
      |--08:EXCHANGE [BROADCAST]
      |  |  hosts=1 per-host-mem=0B
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  07:AGGREGATE [FINALIZE]
      |  |  output: count:merge(*)
      |  |  group by: i_manufact
      |  |  having: zeroifnull(count(*)) > 0
      |  |  hosts=1 per-host-mem=128.00MB
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  06:EXCHANGE [HASH(i_manufact)]
      |  |  hosts=1 per-host-mem=0B
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  02:AGGREGATE [STREAMING]
      |  |  output: count(*)
      |  |  group by: i_manufact
      |  |  hosts=1 per-host-mem=128.00MB
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  01:SCAN HDFS [tpcds1g.item, RANDOM]
      |     partitions=1/1 files=1 size=4.80MB
      |     predicates: ((i_category = 'Women' AND (i_color = 'powder' OR i_color = 'khaki') AND (i_units = 'Ounce' OR i_units = 'Oz') AND (i_size = 'medium' OR i_size = 'extra large')) OR (i_category = 'Women' AND (i_color = 'brown' OR i_color = 'honeydew') AND (i_units = 'Bunch' OR i_units = 'Ton') AND (i_size = 'N/A' OR i_size = 'small')) OR (i_category = 'Men' AND (i_color = 'floral' OR i_color = 'deep') AND (i_units = 'N/A' OR i_units = 'Dozen') AND (i_size = 'petite' OR i_size = 'large')) OR (i_category = 'Men' AND (i_color = 'light' OR i_color = 'cornflower') AND (i_units = 'Box' OR i_units = 'Pound') AND (i_size = 'medium' OR i_size = 'extra large')) OR (i_category = 'Women' AND (i_color = 'midnight' OR i_color = 'snow') AND (i_units = 'Pallet' OR i_units = 'Gross') AND (i_size = 'medium' OR i_size = 'extra large')) OR (i_category = 'Women' AND (i_color = 'cyan' OR i_color = 'papaya') AND (i_units = 'Cup' OR i_units = 'Dram') AND (i_size = 'N/A' OR i_size = 'small')) OR (i_category = 'Men' AND (i_color = 'orange' OR i_color = 'frosted') AND (i_units = 'Each' OR i_units = 'Tbl') AND (i_size = 'petite' OR i_size = 'large')) OR (i_category = 'Men' AND (i_color = 'forest' OR i_color = 'ghost') AND (i_units = 'Lb' OR i_units = 'Bundle') AND (i_size = 'medium' OR i_size = 'extra large')))
      |     table stats: unavailable
      |     column stats: unavailable
      |     hosts=1 per-host-mem=32.00MB
      |     tuple-ids=1 row-size=80B cardinality=unavailable
      |
      00:SCAN HDFS [tpcds1g.item i1, RANDOM]
         partitions=1/1 files=1 size=4.80MB
         predicates: i_manufact_id >= 738, i_manufact_id <= 738 + 40
         table stats: unavailable
         column stats: unavailable
         hosts=1 per-host-mem=32.00MB
         tuple-ids=0 row-size=36B cardinality=unavailable
      ----------------
      

        Activity

        Hide
        alex.behm Alexander Behm added a comment -

        Nice! Thanks for filing this.

        Show
        alex.behm Alexander Behm added a comment - Nice! Thanks for filing this.
        Hide
        yhluo_impala_39a4 Yuanhao Luo added a comment -

        commit f8d48b8582b9e460c2e0e3dbb4881636f179ae73
        Author: Yuanhao Luo <luoyuanhao@software.ict.ac.cn>
        Date: Wed Oct 19 17:10:39 2016 +0800

        IMPALA-4325: StmtRewrite lost parentheses of CompoundPredicate

        StmtRewrite lost parentheses of CompoundPredicate in pushNegationToOperands()
        and leads to incorrect toSql() result. Even though this issue would not leads
        to incorrect result of query, it makes user confuse of the logical operator
        precedence of predicates shown in EXPLAIN statement.

        Change-Id: I79bfc67605206e0e026293bf7032a88227a95623
        Reviewed-on: http://gerrit.cloudera.org:8080/4753
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        yhluo_impala_39a4 Yuanhao Luo added a comment - commit f8d48b8582b9e460c2e0e3dbb4881636f179ae73 Author: Yuanhao Luo <luoyuanhao@software.ict.ac.cn> Date: Wed Oct 19 17:10:39 2016 +0800 IMPALA-4325 : StmtRewrite lost parentheses of CompoundPredicate StmtRewrite lost parentheses of CompoundPredicate in pushNegationToOperands() and leads to incorrect toSql() result. Even though this issue would not leads to incorrect result of query, it makes user confuse of the logical operator precedence of predicates shown in EXPLAIN statement. Change-Id: I79bfc67605206e0e026293bf7032a88227a95623 Reviewed-on: http://gerrit.cloudera.org:8080/4753 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            yhluo_impala_39a4 Yuanhao Luo
            Reporter:
            yhluo_impala_39a4 Yuanhao Luo
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development