Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6955

ExprNodeColDesc isSame doesn't account for tabAlias: this affects trait Propagation in Joins

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.13.1, 0.14.0
    • None
    • None

    Description

      For tpcds Q15:

      explain
      select ca_zip, sum(cs_sales_price)
      from catalog_sales, customer, customer_address, date_dim
      where catalog_sales.cs_bill_customer_sk = customer.c_customer_sk
        and customer.c_current_addr_sk = customer_address.ca_address_sk
        and (substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
                                    '85392', '85460', '80348', '81792')
             or ca_state in ('CA','WA','GA')
             or cs_sales_price > 500)
        and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
        and d_qoy = 2 and d_year = 2001
      group by ca_zip
      order by ca_zip
      limit 100;
      

      The Traits setup for the Operators are:

      FIL[23]: bucketCols=[[]],numBuckets=-1
      RS[11]: bucketCols=[[VALUE._col0]],numBuckets=-1
      JOIN[12]: bucketCols=[[_col71], [_col71]],numBuckets=-1
      FIL[13]: bucketCols=[[_col71], [_col71]],numBuckets=-1
      SEL[14]: bucketCols=[[_col71], [_col71]],numBuckets=-1
      GBY[15]: bucketCols=[[_col0]],numBuckets=-1
      RS[16]: bucketCols=[[KEY._col0]],numBuckets=-1
      GBY[17]: bucketCols=[[_col0]],numBuckets=-1
      SEL[18]: bucketCols=[[_col0]],numBuckets=-1
      LIM[21]: bucketCols=[[_col0]],numBuckets=-1
      FS[22]: bucketCols=[[_col0]],numBuckets=-1
      TS[3]: bucketCols=[[]],numBuckets=-1
      RS[5]: bucketCols=[[VALUE._col0]],numBuckets=-1
      JOIN[6]: bucketCols=[[_col3], [_col36]],numBuckets=-1
      RS[7]: bucketCols=[[VALUE._col40]],numBuckets=-1
      JOIN[9]: bucketCols=[[_col40], [_col0]],numBuckets=-1
      RS[10]: bucketCols=[[VALUE._col0]],numBuckets=-1
      TS[1]: bucketCols=[[]],numBuckets=-1
      RS[8]: bucketCols=[[VALUE._col0]],numBuckets=-1
      TS[0]: bucketCols=[[]],numBuckets=-1
      RS[4]: bucketCols=[[VALUE._col3]],numBuckets=-1
      

      This is incorrect:
      Join[9] joins ca join (cs join cust). In this case both sides of join have a '_col0' column. The reverse mapping of trait propagation relies on ExprNodeColumnDesc.isSame; since this doesn't account for the tabAlias we end up with Join[9] being bucketed on cs_sold_date_sk; Join[12] has the same issue, only compounds the error.

      Attachments

        1. HIVE-6955.1.patch
          0.6 kB
          Harish Butani

        Activity

          People

            rhbutani Harish Butani
            rhbutani Harish Butani
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: