Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3768

MR-2450 introduced a significant performance regression (Hive)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.23.1
    • None
    • mrv2
    • None

    Description

      MAPREDUCE-2450 introduced, or at least triggers, a significant performance regression in Hive. With MR-2450 the execution time of TestCliDriver.skewjoin goes from 2 minutes to 15 minutes. Reverting this change from the build fixes the issue.

      Here's the relevant query:

      FROM src src1 JOIN src src2 ON (src1.key = src2.key)
      INSERT OVERWRITE TABLE dest_j1 SELECT src1.key, src2.value; 
      

      You can reproduce this by running the following from Hive 8.0 against Hadoop built from branch-23.

      ant very-clean package test -Dtestcase=TestCliDriver -Dqfile=skewjoin.q
      

      Attachments

        1. stopcommunicatorpatch.txt
          2 kB
          Siddharth Seth

        Activity

          People

            Unassigned Unassigned
            eli Eli Collins
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated: