Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12762

Common join on parquet tables returns incorrect result when hive.optimize.index.filter set to true

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0, 2.1.0
    • Fix Version/s: 2.0.0
    • Component/s: Logical Optimizer
    • Labels:
      None

      Description

      The following query will give incorrect result.

      CREATE TABLE tbl1(id INT) STORED AS PARQUET;
      INSERT INTO tbl1 VALUES(1), (2);
      
      CREATE TABLE tbl2(id INT, value STRING) STORED AS PARQUET;
      INSERT INTO tbl2 VALUES(1, 'value1');
      INSERT INTO tbl2 VALUES(1, 'value2');
      
      set hive.optimize.index.filter = true;
      set hive.auto.convert.join=false;
      select tbl1.id, t1.value, t2.value
      FROM tbl1
      JOIN (SELECT * FROM tbl2 WHERE value='value1') t1 ON tbl1.id=t1.id
      JOIN (SELECT * FROM tbl2 WHERE value='value2') t2 ON tbl1.id=t2.id;
      

      We are enforcing to use common join and tbl2 will have 2 files after 2 insertions underneath.

      the map job contains 3 TableScan operators (2 for tbl2 and 1 for tbl1). When hive.optimize.index.filter is set to true, we are incorrectly applying the later filtering condition to each block, which causes no data is returned for the subquery SELECT * FROM tbl2 WHERE value='value1'.

        Attachments

        1. HIVE-12762.2.patch
          15 kB
          Aihua Xu
        2. HIVE-12762.patch
          25 kB
          Aihua Xu

          Issue Links

            Activity

              People

              • Assignee:
                aihuaxu Aihua Xu
                Reporter:
                aihuaxu Aihua Xu
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: