Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12762

Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.1.0
    • 2.0.0
    • Logical Optimizer
    • None

    Description

      The following query will give incorrect result.

      CREATE TABLE tbl1(id INT) STORED AS PARQUET;
      INSERT INTO tbl1 VALUES(1), (2);
      
      CREATE TABLE tbl2(id INT, value STRING) STORED AS PARQUET;
      INSERT INTO tbl2 VALUES(1, 'value1');
      INSERT INTO tbl2 VALUES(1, 'value2');
      
      set hive.optimize.index.filter = true;
      set hive.auto.convert.join=false;
      select tbl1.id, t1.value, t2.value
      FROM tbl1
      JOIN (SELECT * FROM tbl2 WHERE value='value1') t1 ON tbl1.id=t1.id
      JOIN (SELECT * FROM tbl2 WHERE value='value2') t2 ON tbl1.id=t2.id;
      

      We are enforcing to use common join and tbl2 will have 2 files after 2 insertions underneath.

      the map job contains 3 TableScan operators (2 for tbl2 and 1 for tbl1). When hive.optimize.index.filter is set to true, we are incorrectly applying the later filtering condition to each block, which causes no data is returned for the subquery SELECT * FROM tbl2 WHERE value='value1'.

      Attachments

        1. HIVE-12762.2.patch
          15 kB
          Aihua Xu
        2. HIVE-12762.patch
          25 kB
          Aihua Xu

        Issue Links

          Activity

            People

              aihuaxu Aihua Xu
              aihuaxu Aihua Xu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: