Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6041

Incorrect task dependency graph for skewed join optimization

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0
    • Fix Version/s: 0.13.0
    • Component/s: Query Processor
    • Labels:
      None
    • Environment:

      Hadoop 1.0.3

      Description

      The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through "hive.optimize.skewjoin". For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime.

      In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped.

      The bug resides in "ql/optimizer/physical/GenMRSkewJoinProcessor.java", processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set.

        Attachments

        1. HIVE-6041.1.patch.txt
          14 kB
          Navis Ryu

          Issue Links

            Activity

              People

              • Assignee:
                navis Navis Ryu
                Reporter:
                adipdia Adrian Popescu
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: