Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3326

plan for multiple mapjoin followed by a normal join is wrong

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • None
    • None
    • SQL
    • None
    • OS X 10.8; java 1.6.0_33

    Description

      example queries:

      create table yudi(c1 int, c2 int, c3 int, c4 int);
      create table wangmu(c1 int, c2 int, c3 int, c4 int);
      select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
      

      in explain mode, I got this:

      hive> explain select /*+mapjoin(b,c)*/ * from yudi a join yudi b on a.c1=b.c1 join wangmu c on b.c2=c.c2 join yudi d on a.c3=d.c3;
      OK
      STAGE DEPENDENCIES:
        Stage-8 is a root stage
        Stage-2 depends on stages: Stage-8
        Stage-7 depends on stages: Stage-2
        Stage-3 depends on stages: Stage-7
        Stage-1 depends on stages: Stage-3
      
      STAGE PLANS:
        Stage: Stage-8
          Map Reduce Local Work
            Alias -> Map Local Tables:
              b
              <Not Important>
        Stage: Stage-2
          Map Reduce
            Alias -> Map Operator Tree:
              a
              <Not Important>
            Local Work:
              Map Reduce Local Work
      
        Stage: Stage-7
          Map Reduce Local Work
            Alias -> Map Local Tables:
              c
              <Not Important>
        Stage: Stage-3
          Map Reduce
            Alias -> Map Operator Tree:
                 file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
              <Not Important>
            Local Work:
              Map Reduce Local Work
      
        Stage: Stage-1
          Map Reduce
            Alias -> Map Operator Tree:
              d
                TableScan
      
              file:/var/folders/4w/3_nk1cwd4pd023mzx64p3r480000gn/T/dukezhang/hive_2012-08-01_14-01-37_152_5814747325029961632/-mr-10002
                Select Operator
      
            Reduce Operator Tree:
            <Not Important>
      

      You see, mapper of Stage-1 should read from Stage-3, maybe '.../-mr-10003', not Stage-2(result in '.../-mr-10002').

      To resolve this bug, I found these codes(GenMapRedUtils.java, about line 431):

      GenMapRedUtils.java
      if (oldMapJoin == null) {
        if (opProcCtx.getParseCtx().getListMapJoinOpsNoReducer().contains(mjOp)
            || local || (oldTask != null) && (parTasks != null)) {
          taskTmpDir = mjCtx.getTaskTmpDir();
          tt_desc = mjCtx.getTTDesc();
          rootOp = mjCtx.getRootMapJoinOp();
          }
      } else {
        GenMRMapJoinCtx oldMjCtx = opProcCtx.getMapJoinCtx(oldMapJoin);
        assert oldMjCtx != null;
        taskTmpDir = oldMjCtx.getTaskTmpDir();
        tt_desc = oldMjCtx.getTTDesc();
        rootOp = oldMjCtx.getRootMapJoinOp();
      }
      

      my query goes into 'else' block and gets wrong taskTmpDir. I hack them to let query go into 'if' block, and it works.

      Attachments

        1. HIVE-3326.D8091.1.patch
          11 kB
          Phabricator
        2. patch.diff
          1 kB
          Zhang Xinyu

        Issue Links

          Activity

            People

              navis Navis Ryu
              yuzone Zhang Xinyu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: