Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3768

MR-2450 introduced a significant performance regression (Hive)

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.23.1
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      MAPREDUCE-2450 introduced, or at least triggers, a significant performance regression in Hive. With MR-2450 the execution time of TestCliDriver.skewjoin goes from 2 minutes to 15 minutes. Reverting this change from the build fixes the issue.

      Here's the relevant query:

      FROM src src1 JOIN src src2 ON (src1.key = src2.key)
      INSERT OVERWRITE TABLE dest_j1 SELECT src1.key, src2.value; 
      

      You can reproduce this by running the following from Hive 8.0 against Hadoop built from branch-23.

      ant very-clean package test -Dtestcase=TestCliDriver -Dqfile=skewjoin.q
      

        Activity

          People

          • Assignee:
            Unassigned
            Reporter:
            Eli Collins
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:

              Development