Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4325

StmtRewrite lost parentheses of CompoundPredicate in pushNegationToOperands()

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.8.0
    • Impala 2.8.0
    • Frontend
    • CentOS release 6.8 (Final)

    Description

      StmtRewrite lost parentheses of CompoundPredicate in pushNegationToOperands(), as a result, getExplainString(conjuncts_) can't get the correct string.

      For example, we have a table defined by:

      create table if not exists  item
      (
      i_item_sk                 integer               ,
      i_item_id                 varchar(16)              ,
      i_rec_start_date          timestamp                          ,
      i_rec_end_date            timestamp                          ,
      i_item_desc               varchar(200)                  ,
      i_current_price           decimal(7,2)                  ,
      i_wholesale_cost          decimal(7,2)                  ,
      i_brand_id                integer                       ,
      i_brand                   varchar(50)                      ,
      i_class_id                integer                       ,
      i_class                   varchar(50)                      ,
      i_category_id             integer                       ,
      i_category                varchar(50)                      ,
      i_manufact_id             integer                       ,
      i_manufact                varchar(50)                      ,
      i_size                    varchar(20)                      ,
      i_formulation             varchar(20)                      ,
      i_color                   varchar(20)                      ,
      i_units                   varchar(10)                      ,
      i_container               varchar(10)                      ,
      i_manager_id              integer                       ,
      i_product_name            varchar(50)
      ) row format delimited fields terminated by '|' escaped by '\\' lines terminated by '\n';
      

      And if we run query:

      select  distinct(i_product_name) from item i1 where i_manufact_id between 738 and 738+40
      and (
        select count(*) as item_cnt from item where i_manufact = i1.i_manufact and
        (
          (
            (i_category = 'Women' and
              (i_color = 'powder' or i_color = 'khaki') and
              (i_units = 'Ounce' or i_units = 'Oz') and
              (i_size = 'medium' or i_size = 'extra large')
            )
            or
            (i_category = 'Women' and
              (i_color = 'brown' or i_color = 'honeydew') and
              (i_units = 'Bunch' or i_units = 'Ton') and
              (i_size = 'N/A' or i_size = 'small')
            )
            or
            (i_category = 'Men' and
              (i_color = 'floral' or i_color = 'deep') and
              (i_units = 'N/A' or i_units = 'Dozen') and
              (i_size = 'petite' or i_size = 'large')
            )
            or
            (i_category = 'Men' and
              (i_color = 'light' or i_color = 'cornflower') and
              (i_units = 'Box' or i_units = 'Pound') and
              (i_size = 'medium' or i_size = 'extra large')
            )
            or
            (i_category = 'Women' and
              (i_color = 'midnight' or i_color = 'snow') and
              (i_units = 'Pallet' or i_units = 'Gross') and
              (i_size = 'medium' or i_size = 'extra large')
            )
            or
            (i_category = 'Women' and
              (i_color = 'cyan' or i_color = 'papaya') and
              (i_units = 'Cup' or i_units = 'Dram') and
              (i_size = 'N/A' or i_size = 'small')
            )
            or
            (i_category = 'Men' and
              (i_color = 'orange' or i_color = 'frosted') and
              (i_units = 'Each' or i_units = 'Tbl') and
              (i_size = 'petite' or i_size = 'large')
            )
            or
            (i_category = 'Men' and
              (i_color = 'forest' or i_color = 'ghost') and
              (i_units = 'Lb' or i_units = 'Bundle') and
              (i_size = 'medium' or i_size = 'extra large')
            )
          )
        )
      ) > 0
      order by i_product_name
      limit 100;
      

      The Text Plan in webpage of this query is as followed:

      ----------------
      Estimated Per-Host Requirements: Memory=2.28GB VCores=2
      WARNING: The following tables are missing relevant table and/or column statistics.
      tpcds1g.item
      
      11:MERGING-EXCHANGE [UNPARTITIONED]
      |  order by: (i_product_name) ASC
      |  limit: 100
      |  hosts=1 per-host-mem=unavailable
      |  tuple-ids=5 row-size=16B cardinality=100
      |
      05:TOP-N [LIMIT=100]
      |  order by: (i_product_name) ASC
      |  hosts=1 per-host-mem=1.56KB
      |  tuple-ids=5 row-size=16B cardinality=100
      |
      10:AGGREGATE [FINALIZE]
      |  group by: (i_product_name)
      |  hosts=1 per-host-mem=128.00MB
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      09:EXCHANGE [HASH((i_product_name))]
      |  hosts=1 per-host-mem=0B
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      04:AGGREGATE [STREAMING]
      |  group by: (i_product_name)
      |  hosts=1 per-host-mem=128.00MB
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
      |  hash predicates: i1.i_manufact = i_manufact
      |  other predicates: zeroifnull(count(*)) > 0
      |  hosts=1 per-host-mem=2.00GB
      |  tuple-ids=0,2N row-size=60B cardinality=unavailable
      |
      |--08:EXCHANGE [BROADCAST]
      |  |  hosts=1 per-host-mem=0B
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  07:AGGREGATE [FINALIZE]
      |  |  output: count:merge(*)
      |  |  group by: i_manufact
      |  |  having: zeroifnull(count(*)) > 0
      |  |  hosts=1 per-host-mem=128.00MB
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  06:EXCHANGE [HASH(i_manufact)]
      |  |  hosts=1 per-host-mem=0B
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  02:AGGREGATE [STREAMING]
      |  |  output: count(*)
      |  |  group by: i_manufact
      |  |  hosts=1 per-host-mem=128.00MB
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  01:SCAN HDFS [tpcds1g.item, RANDOM]
      |     partitions=1/1 files=1 size=4.80MB
      |     predicates: i_category = 'Women' AND i_color = 'powder' OR i_color = 'khaki' AND i_units = 'Ounce' OR i_units = 'Oz' AND i_size = 'medium' OR i_size = 'extra large' OR i_category = 'Women' AND i_color = 'brown' OR i_color = 'honeydew' AND i_units = 'Bunch' OR i_units = 'Ton' AND i_size = 'N/A' OR i_size = 'small' OR i_category = 'Men' AND i_color = 'floral' OR i_color = 'deep' AND i_units = 'N/A' OR i_units = 'Dozen' AND i_size = 'petite' OR i_size = 'large' OR i_category = 'Men' AND i_color = 'light' OR i_color = 'cornflower' AND i_units = 'Box' OR i_units = 'Pound' AND i_size = 'medium' OR i_size = 'extra large' OR i_category = 'Women' AND i_color = 'midnight' OR i_color = 'snow' AND i_units = 'Pallet' OR i_units = 'Gross' AND i_size = 'medium' OR i_size = 'extra large' OR i_category = 'Women' AND i_color = 'cyan' OR i_color = 'papaya' AND i_units = 'Cup' OR i_units = 'Dram' AND i_size = 'N/A' OR i_size = 'small' OR i_category = 'Men' AND i_color = 'orange' OR i_color = 'frosted' AND i_units = 'Each' OR i_units = 'Tbl' AND i_size = 'petite' OR i_size = 'large' OR i_category = 'Men' AND i_color = 'forest' OR i_color = 'ghost' AND i_units = 'Lb' OR i_units = 'Bundle' AND i_size = 'medium' OR i_size = 'extra large'
      |     table stats: unavailable
      |     column stats: unavailable
      |     hosts=1 per-host-mem=32.00MB
      |     tuple-ids=1 row-size=80B cardinality=unavailable
      |
      00:SCAN HDFS [tpcds1g.item i1, RANDOM]
         partitions=1/1 files=1 size=4.80MB
         predicates: i_manufact_id >= 738, i_manufact_id <= 738 + 40
         table stats: unavailable
         column stats: unavailable
         hosts=1 per-host-mem=32.00MB
         tuple-ids=0 row-size=36B cardinality=unavailable
      ----------------
      

      we can see that predicates of 01:SCAN HDFS [tpcds1g.item, RANDOM] lost all parentheses. Even the query result is correct, it makes me confuse while reading this Text Plan. The proper Text Plan should be:

      ----------------
      Estimated Per-Host Requirements: Memory=2.28GB VCores=2
      WARNING: The following tables are missing relevant table and/or column statistics.
      tpcds1g.item
      
      11:MERGING-EXCHANGE [UNPARTITIONED]
      |  order by: (i_product_name) ASC
      |  limit: 100
      |  hosts=1 per-host-mem=unavailable
      |  tuple-ids=5 row-size=16B cardinality=100
      |
      05:TOP-N [LIMIT=100]
      |  order by: (i_product_name) ASC
      |  hosts=1 per-host-mem=1.56KB
      |  tuple-ids=5 row-size=16B cardinality=100
      |
      10:AGGREGATE [FINALIZE]
      |  group by: (i_product_name)
      |  hosts=1 per-host-mem=128.00MB
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      09:EXCHANGE [HASH((i_product_name))]
      |  hosts=1 per-host-mem=0B
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      04:AGGREGATE [STREAMING]
      |  group by: (i_product_name)
      |  hosts=1 per-host-mem=128.00MB
      |  tuple-ids=4 row-size=16B cardinality=unavailable
      |
      03:HASH JOIN [LEFT OUTER JOIN, BROADCAST]
      |  hash predicates: i1.i_manufact = i_manufact
      |  other predicates: zeroifnull(count(*)) > 0
      |  hosts=1 per-host-mem=2.00GB
      |  tuple-ids=0,2N row-size=60B cardinality=unavailable
      |
      |--08:EXCHANGE [BROADCAST]
      |  |  hosts=1 per-host-mem=0B
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  07:AGGREGATE [FINALIZE]
      |  |  output: count:merge(*)
      |  |  group by: i_manufact
      |  |  having: zeroifnull(count(*)) > 0
      |  |  hosts=1 per-host-mem=128.00MB
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  06:EXCHANGE [HASH(i_manufact)]
      |  |  hosts=1 per-host-mem=0B
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  02:AGGREGATE [STREAMING]
      |  |  output: count(*)
      |  |  group by: i_manufact
      |  |  hosts=1 per-host-mem=128.00MB
      |  |  tuple-ids=2 row-size=24B cardinality=unavailable
      |  |
      |  01:SCAN HDFS [tpcds1g.item, RANDOM]
      |     partitions=1/1 files=1 size=4.80MB
      |     predicates: ((i_category = 'Women' AND (i_color = 'powder' OR i_color = 'khaki') AND (i_units = 'Ounce' OR i_units = 'Oz') AND (i_size = 'medium' OR i_size = 'extra large')) OR (i_category = 'Women' AND (i_color = 'brown' OR i_color = 'honeydew') AND (i_units = 'Bunch' OR i_units = 'Ton') AND (i_size = 'N/A' OR i_size = 'small')) OR (i_category = 'Men' AND (i_color = 'floral' OR i_color = 'deep') AND (i_units = 'N/A' OR i_units = 'Dozen') AND (i_size = 'petite' OR i_size = 'large')) OR (i_category = 'Men' AND (i_color = 'light' OR i_color = 'cornflower') AND (i_units = 'Box' OR i_units = 'Pound') AND (i_size = 'medium' OR i_size = 'extra large')) OR (i_category = 'Women' AND (i_color = 'midnight' OR i_color = 'snow') AND (i_units = 'Pallet' OR i_units = 'Gross') AND (i_size = 'medium' OR i_size = 'extra large')) OR (i_category = 'Women' AND (i_color = 'cyan' OR i_color = 'papaya') AND (i_units = 'Cup' OR i_units = 'Dram') AND (i_size = 'N/A' OR i_size = 'small')) OR (i_category = 'Men' AND (i_color = 'orange' OR i_color = 'frosted') AND (i_units = 'Each' OR i_units = 'Tbl') AND (i_size = 'petite' OR i_size = 'large')) OR (i_category = 'Men' AND (i_color = 'forest' OR i_color = 'ghost') AND (i_units = 'Lb' OR i_units = 'Bundle') AND (i_size = 'medium' OR i_size = 'extra large')))
      |     table stats: unavailable
      |     column stats: unavailable
      |     hosts=1 per-host-mem=32.00MB
      |     tuple-ids=1 row-size=80B cardinality=unavailable
      |
      00:SCAN HDFS [tpcds1g.item i1, RANDOM]
         partitions=1/1 files=1 size=4.80MB
         predicates: i_manufact_id >= 738, i_manufact_id <= 738 + 40
         table stats: unavailable
         column stats: unavailable
         hosts=1 per-host-mem=32.00MB
         tuple-ids=0 row-size=36B cardinality=unavailable
      ----------------
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            yhluo_impala_39a4 Yuanhao Luo
            yhluo_impala_39a4 Yuanhao Luo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment