Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9025

join38.q (without map join) produces incorrect result when testing with multiple reducers

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.14.0
    • Fix Version/s: 1.0.0
    • Component/s: Logical Optimizer
    • Labels:
      None

      Description

      I have this query from a modified version of join38.q, which does NOT use map join:

      FROM src a JOIN tmp b ON (a.key = b.col11)
      SELECT a.value, b.col5, count(1) as count
      where b.col11 = 111
      group by a.value, b.col5;
      

      If I set mapred.reduce.tasks to 1, the result is correct. But, if I set it to be a larger number (3 for instance), then result will be

      val_111	105	1
      

      which is wrong.

      I think the issue is that, for this case, ConstantPropagationProcFactory will overwrite the partition cols for the reduce sink desc, with an empty list. Then, later on in ReduceSinkOperator#computeHashCode, since partitionEval is length 0, it will use an random number as hashcode, for each separate row. As result, rows with same key will be distributed to different reducers, and hence leads to incorrect result.

      1. HIVE-9025.patch
        22 kB
        Ted Xu
      2. HIVE-9025.1.patch
        40 kB
        Ted Xu

        Issue Links

          Activity

          Hide
          xuefuz Xuefu Zhang added a comment -

          This seems caused by HIVE-5771. Ted Xu, could you please take a look?

          Show
          xuefuz Xuefu Zhang added a comment - This seems caused by HIVE-5771 . Ted Xu , could you please take a look?
          Hide
          tedxu Ted Xu added a comment -

          Thanks Chao Sun, it is a bug.

          I will fix it ASAP.

          Show
          tedxu Ted Xu added a comment - Thanks Chao Sun , it is a bug. I will fix it ASAP.
          Hide
          hiveqa Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12685602/HIVE-9025.patch

          ERROR: -1 due to 7 failed/errored test(s), 6696 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join_nullsafe
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_mapjoin
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1987/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1987/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1987/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 7 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12685602 - PreCommit-HIVE-TRUNK-Build

          Show
          hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12685602/HIVE-9025.patch ERROR: -1 due to 7 failed/errored test(s), 6696 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynamic_partition_pruning_2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_join_nullsafe org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_dynamic_partition_pruning Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1987/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1987/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1987/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed This message is automatically generated. ATTACHMENT ID: 12685602 - PreCommit-HIVE-TRUNK-Build
          Hide
          ashutoshc Ashutosh Chauhan added a comment -

          +1
          Vikram Dixit K it will be good to have this in 0.14 branch as well, since this is a correctness issue.

          Show
          ashutoshc Ashutosh Chauhan added a comment - +1 Vikram Dixit K it will be good to have this in 0.14 branch as well, since this is a correctness issue.
          Hide
          vikram.dixit Vikram Dixit K added a comment -

          +1 for 0.14

          Show
          vikram.dixit Vikram Dixit K added a comment - +1 for 0.14
          Hide
          ashutoshc Ashutosh Chauhan added a comment -

          Committed to trunk & 0.14. Thanks, Ted!

          Show
          ashutoshc Ashutosh Chauhan added a comment - Committed to trunk & 0.14. Thanks, Ted!
          Hide
          xuefuz Xuefu Zhang added a comment -

          Also merged to Spark branch.

          Show
          xuefuz Xuefu Zhang added a comment - Also merged to Spark branch.
          Hide
          thejas Thejas M Nair added a comment -

          Updating release version for jiras resolved in 1.0.0 .

          Show
          thejas Thejas M Nair added a comment - Updating release version for jiras resolved in 1.0.0 .
          Hide
          thejas Thejas M Nair added a comment -

          This issue has been fixed in Apache Hive 1.0.0. If there is any issue with the fix, please open a new jira to address it.

          Show
          thejas Thejas M Nair added a comment - This issue has been fixed in Apache Hive 1.0.0. If there is any issue with the fix, please open a new jira to address it.

            People

            • Assignee:
              tedxu Ted Xu
              Reporter:
              csun Chao Sun
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development