Hive
  1. Hive
  2. HIVE-3326

plan for multiple mapjoin followed by a normal join is wrong

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None
    • Environment:

      OS X 10.8; java 1.6.0_33

      Description

      example queries:

      create table yudi(c1 int, c2 int, c3 int, c4 int);
      create table wangmu(c1 int, c2 int, c3 int, c4 int);
      select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
      

      in explain mode, I got this:

      hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
      OK
      STAGE DEPENDENCIES:
        Stage-8 is a root stage
        Stage-2 depends on stages: Stage-8
        Stage-7 depends on stages: Stage-2
        Stage-3 depends on stages: Stage-7
        Stage-1 depends on stages: Stage-3
      
      STAGE PLANS:
        Stage: Stage-8
          Map Reduce Local Work
            Alias -> Map Local Tables:
              b
              <Not Important>
        Stage: Stage-2
          Map Reduce
            Alias -> Map Operator Tree:
              a
              <Not Important>
            Local Work:
              Map Reduce Local Work
      
        Stage: Stage-7
          Map Reduce Local Work
            Alias -> Map Local Tables:
              c
              <Not Important>
        Stage: Stage-3
          Map Reduce
            Alias -> Map Operator Tree:
                 file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
              <Not Important>
            Local Work:
              Map Reduce Local Work
      
        Stage: Stage-1
          Map Reduce
            Alias -> Map Operator Tree:
              d
                TableScan
      
              file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
                Select Operator
      
            Reduce Operator Tree:
            <Not Important>
      

      You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').

      To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):

      GenMapRedUtils.java
      if (oldMapJoin == null) {
        if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
            || local || (oldTask != null) && (parTasks != null)) {
          taskTmpDir = mjCtx.getTaskTmpDir();
          tt_desc = mjCtx.getTTDesc();
          rootOp = mjCtx.getRootMapJoinOp();
          }
      } else {
        GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
        assert oldMjCtx != null;
        taskTmpDir = oldMjCtx.getTaskTmpDir();
        tt_desc = oldMjCtx.getTTDesc();
        rootOp = oldMjCtx.getRootMapJoinOp();
      }
      

      my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

      1. HIVE-3326.D8091.1.patch
        11 kB
        Phabricator
      2. patch.diff
        1 kB
        Zhang Xinyu

        Issue Links

          Activity

          Hide
          Zhang Xinyu added a comment -

          this is my patch for this bug. I use an unused argument to control my logic whether goes into 'else' block

          Show
          Zhang Xinyu added a comment - this is my patch for this bug. I use an unused argument to control my logic whether goes into 'else' block
          Hide
          Navis added a comment -

          I think the condition,

          if (oldMapJoin == null) {
          

          should be changed to

          if (oldMapJoin == null || !opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(oldMapJoin)) {
          
          Show
          Navis added a comment - I think the condition, if (oldMapJoin == null ) { should be changed to if (oldMapJoin == null || !opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(oldMapJoin)) {
          Hide
          Zhang Xinyu added a comment -

          cool!Thx

          Show
          Zhang Xinyu added a comment - cool!Thx
          Hide
          Navis added a comment -

          I've forgot this issue for months. should be fixed.

          Show
          Navis added a comment - I've forgot this issue for months. should be fixed.
          Hide
          Phabricator added a comment -

          navis requested code review of "HIVE-3326 [jira] plan for multiple mapjoin followed by a normal join is wrong".
          Reviewers: JIRA

          DPAL-1968 plan for multiple mapjoin followed by a normal join is wrong

          example queries:

          create table yudi(c1 int, c2 int, c3 int, c4 int);
          create table wangmu(c1 int, c2 int, c3 int, c4 int);
          select /+mapjoin(b,c)/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;

          in explain mode, I got this:

          hive> explain select /+mapjoin(b,c)/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
          OK
          STAGE DEPENDENCIES:
          Stage-8 is a root stage
          Stage-2 depends on stages: Stage-8
          Stage-7 depends on stages: Stage-2
          Stage-3 depends on stages: Stage-7
          Stage-1 depends on stages: Stage-3

          STAGE PLANS:
          Stage: Stage-8
          Map Reduce Local Work
          Alias -> Map Local Tables:
          b
          <Not Important>
          Stage: Stage-2
          Map Reduce
          Alias -> Map Operator Tree:
          a
          <Not Important>
          Local Work:
          Map Reduce Local Work

          Stage: Stage-7
          Map Reduce Local Work
          Alias -> Map Local Tables:
          c
          <Not Important>
          Stage: Stage-3
          Map Reduce
          Alias -> Map Operator Tree:
          file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
          <Not Important>
          Local Work:
          Map Reduce Local Work

          Stage: Stage-1
          Map Reduce
          Alias -> Map Operator Tree:
          d
          TableScan

          file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
          Select Operator

          Reduce Operator Tree:
          <Not Important>

          You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').

          To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):
          GenMapRedUtils.java

          if (oldMapJoin == null) {
          if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)

          local (oldTask != null) && (parTasks != null)) { taskTmpDir = mjCtx.getTaskTmpDir(); tt_desc = mjCtx.getTTDesc(); rootOp = mjCtx.getRootMapJoinOp(); }

          } else

          { GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin); assert oldMjCtx != null; taskTmpDir = oldMjCtx.getTaskTmpDir(); tt_desc = oldMjCtx.getTTDesc(); rootOp = oldMjCtx.getRootMapJoinOp(); }

          my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

          TEST PLAN
          EMPTY

          REVISION DETAIL
          https://reviews.facebook.net/D8091

          AFFECTED FILES
          ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
          ql/src/test/queries/clientpositive/mapjoin_mapjoin_join.q
          ql/src/test/results/clientpositive/mapjoin_mapjoin_join.q.out

          MANAGE HERALD DIFFERENTIAL RULES
          https://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          https://reviews.facebook.net/herald/transcript/19497/

          To: JIRA, navis

          Show
          Phabricator added a comment - navis requested code review of " HIVE-3326 [jira] plan for multiple mapjoin followed by a normal join is wrong". Reviewers: JIRA DPAL-1968 plan for multiple mapjoin followed by a normal join is wrong example queries: create table yudi(c1 int, c2 int, c3 int, c4 int); create table wangmu(c1 int, c2 int, c3 int, c4 int); select / +mapjoin(b,c) / * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; in explain mode, I got this: hive> explain select / +mapjoin(b,c) / * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3; OK STAGE DEPENDENCIES: Stage-8 is a root stage Stage-2 depends on stages: Stage-8 Stage-7 depends on stages: Stage-2 Stage-3 depends on stages: Stage-7 Stage-1 depends on stages: Stage-3 STAGE PLANS: Stage: Stage-8 Map Reduce Local Work Alias -> Map Local Tables: b <Not Important> Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: a <Not Important> Local Work: Map Reduce Local Work Stage: Stage-7 Map Reduce Local Work Alias -> Map Local Tables: c <Not Important> Stage: Stage-3 Map Reduce Alias -> Map Operator Tree: file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 <Not Important> Local Work: Map Reduce Local Work Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: d TableScan file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002 Select Operator Reduce Operator Tree: <Not Important> You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002'). To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431): GenMapRedUtils.java if (oldMapJoin == null) { if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp) local (oldTask != null) && (parTasks != null)) { taskTmpDir = mjCtx.getTaskTmpDir(); tt_desc = mjCtx.getTTDesc(); rootOp = mjCtx.getRootMapJoinOp(); } } else { GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin); assert oldMjCtx != null; taskTmpDir = oldMjCtx.getTaskTmpDir(); tt_desc = oldMjCtx.getTTDesc(); rootOp = oldMjCtx.getRootMapJoinOp(); } my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D8091 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java ql/src/test/queries/clientpositive/mapjoin_mapjoin_join.q ql/src/test/results/clientpositive/mapjoin_mapjoin_join.q.out MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/19497/ To: JIRA, navis
          Hide
          Namit Jain added a comment -

          comments on phabricator

          Show
          Namit Jain added a comment - comments on phabricator
          Hide
          Phabricator added a comment -

          njain has commented on the revision "HIVE-3326 [jira] plan for multiple mapjoin followed by a normal join is wrong".

          Navis, I am not sure, we should support this.
          https://issues.apache.org/jira/browse/HIVE-3784 is the right way to go.
          We are adding way more complexity than is needed to solve this problem.

          Let me refresh HIVE-3784 and try to address Ashutosh's concerns.

          REVISION DETAIL
          https://reviews.facebook.net/D8091

          To: JIRA, navis
          Cc: njain

          Show
          Phabricator added a comment - njain has commented on the revision " HIVE-3326 [jira] plan for multiple mapjoin followed by a normal join is wrong". Navis, I am not sure, we should support this. https://issues.apache.org/jira/browse/HIVE-3784 is the right way to go. We are adding way more complexity than is needed to solve this problem. Let me refresh HIVE-3784 and try to address Ashutosh's concerns. REVISION DETAIL https://reviews.facebook.net/D8091 To: JIRA, navis Cc: njain
          Hide
          Ashutosh Chauhan added a comment -

          Now that HIVE-3784 went in, this bug is still present in GenMapRedUtils.java. But I suspect we will never encounter this because hints for multiple map-joins are no longer supported so its impossible to enforce map-joins. They will automatically get converted in physical optimizer. This implies though bug is there, there is no reasonable way to hit this code path. Namit Jain is that correct ? If this is true, than instead of having this bug lurking in codebase and worrying about it, we should refactor the code to get rid of this dead code. If this is not true, than we should commit this patch, perhaps with testcase which will produce this behavior.

          Show
          Ashutosh Chauhan added a comment - Now that HIVE-3784 went in, this bug is still present in GenMapRedUtils.java. But I suspect we will never encounter this because hints for multiple map-joins are no longer supported so its impossible to enforce map-joins. They will automatically get converted in physical optimizer. This implies though bug is there, there is no reasonable way to hit this code path. Namit Jain is that correct ? If this is true, than instead of having this bug lurking in codebase and worrying about it, we should refactor the code to get rid of this dead code. If this is not true, than we should commit this patch, perhaps with testcase which will produce this behavior.
          Hide
          Navis added a comment -

          Cannot reproduce the situation by HIVE-3784

          Show
          Navis added a comment - Cannot reproduce the situation by HIVE-3784

            People

            • Assignee:
              Navis
              Reporter:
              Zhang Xinyu
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development