Hive
  1. Hive
  2. HIVE-2466

mapjoin_subquery dump small table (mapjoin table) to the same file

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.7.1
    • Fix Version/s: 0.8.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      in mapjoin_subquery.q there is a query:
      SELECT /*+ MAPJOIN(z) */ subq.key1, z.value
      FROM
      (SELECT /*+ MAPJOIN */ x.key as key1, x.value as value1, y.key as key2, y.value as value2
      FROM src1 x JOIN src y ON (x.key = y.key)) subq
      JOIN srcpart z ON (subq.key1 = z.key and z.ds='2008-04-08' and z.hr=11);
      when dump x and z to a local file,there all dump to the same file, so we lost the data of x

      1. ASF.LICENSE.NOT.GRANTED--D285.1.patch
        57 kB
        Phabricator
      2. ASF.LICENSE.NOT.GRANTED--D285.2.patch
        25 kB
        Phabricator
      3. hive-2466.1.patch
        9 kB
        binlijin
      4. hive-2466.2.patch
        22 kB
        binlijin
      5. hive-2466.3.patch
        25 kB
        binlijin
      6. hive-2466.4.patch
        25 kB
        binlijin

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1067 (See https://builds.apache.org/job/Hive-trunk-h0.21/1067/)
        HIVE-2466 mapjoin_subquery dump small table (mapjoin table) to the same file
        (binlijin via namit)

        namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199117
        Files :

        • /hive/trunk/data/files/x.txt
        • /hive/trunk/data/files/y.txt
        • /hive/trunk/data/files/z.txt
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
        • /hive/trunk/ql/src/test/queries/clientpositive/mapjoin_subquery2.q
        • /hive/trunk/ql/src/test/results/clientpositive/mapjoin_subquery2.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1067 (See https://builds.apache.org/job/Hive-trunk-h0.21/1067/ ) HIVE-2466 mapjoin_subquery dump small table (mapjoin table) to the same file (binlijin via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199117 Files : /hive/trunk/data/files/x.txt /hive/trunk/data/files/y.txt /hive/trunk/data/files/z.txt /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java /hive/trunk/ql/src/test/queries/clientpositive/mapjoin_subquery2.q /hive/trunk/ql/src/test/results/clientpositive/mapjoin_subquery2.q.out
        Hide
        Namit Jain added a comment -

        Committed. Thanks binlijin

        Show
        Namit Jain added a comment - Committed. Thanks binlijin
        Hide
        Namit Jain added a comment -

        +1

        Show
        Namit Jain added a comment - +1
        Hide
        Phabricator added a comment -

        njain updated the revision "HIVE-2466 [jira] mapjoin_subquery dump small table (mapjoin table) to the same file".
        Reviewers: JIRA

        HIVE-2466

        REVISION DETAIL
        https://reviews.facebook.net/D285

        AFFECTED FILES
        data/files/x.txt
        data/files/y.txt
        data/files/z.txt
        ql/src/test/results/clientpositive/mapjoin_subquery2.q.out
        ql/src/test/queries/clientpositive/mapjoin_subquery2.q
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java

        Show
        Phabricator added a comment - njain updated the revision " HIVE-2466 [jira] mapjoin_subquery dump small table (mapjoin table) to the same file". Reviewers: JIRA HIVE-2466 REVISION DETAIL https://reviews.facebook.net/D285 AFFECTED FILES data/files/x.txt data/files/y.txt data/files/z.txt ql/src/test/results/clientpositive/mapjoin_subquery2.q.out ql/src/test/queries/clientpositive/mapjoin_subquery2.q ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java
        Hide
        binlijin added a comment -

        hive-2466.4.patch is based on the trunk, http://svn.apache.org/repos/asf/hive/trunk, I don't have the compile problem.
        how to create a arc diff entry? I use the command "svn diff > hive-2466.4.patch" to get the patch.

        Show
        binlijin added a comment - hive-2466.4.patch is based on the trunk, http://svn.apache.org/repos/asf/hive/trunk , I don't have the compile problem. how to create a arc diff entry? I use the command "svn diff > hive-2466.4.patch" to get the patch.
        Hide
        Namit Jain added a comment -

        [javac] Compiling 680 source files to /data/users/njain/hive_commit2/build/ql/classes
        [javac] /data/users/njain/hive_commit2/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java:191: cannot find symbol
        [javac] symbol : method generatePath(java.lang.String,java.lang.Byte,java.lang.String)
        [javac] location: class org.apache.hadoop.hive.ql.exec.Utilities
        [javac] String filePath = Utilities.generatePath(baseDir, pos, currentFileName);

        Can you refresh ?

        I am getting the above error in compiling.
        Also, create a arc diff entry for helping reviewing.

        Show
        Namit Jain added a comment - [javac] Compiling 680 source files to /data/users/njain/hive_commit2/build/ql/classes [javac] /data/users/njain/hive_commit2/ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java:191: cannot find symbol [javac] symbol : method generatePath(java.lang.String,java.lang.Byte,java.lang.String) [javac] location: class org.apache.hadoop.hive.ql.exec.Utilities [javac] String filePath = Utilities.generatePath(baseDir, pos, currentFileName); Can you refresh ? I am getting the above error in compiling. Also, create a arc diff entry for helping reviewing.
        Hide
        Phabricator added a comment -

        njain requested code review of "HIVE-2466 [jira] mapjoin_subquery dump small table (mapjoin table) to the same file".
        Reviewers: JIRA

        HIVE-2466 diff for review

        in mapjoin_subquery.q there is a query:
        SELECT /*+ MAPJOIN(z) */ subq.key1, z.value
        FROM
        (SELECT /*+ MAPJOIN<img class="emoticon" src="https://issues.apache.org/jira/images/icons/emoticons/error.gif" height="16" width="16" align="absmiddle" alt="" border="0"/> */ x.key as key1, x.value as value1, y.key as key2, y.value as value2
        FROM src1 x JOIN src y ON (x.key = y.key)) subq
        JOIN srcpart z ON (subq.key1 = z.key and z.ds='2008-04-08' and z.hr=11);
        when dump x and z to a local file,there all dump to the same file, so we lost the data of x

        TEST PLAN
        EMPTY

        REVISION DETAIL
        https://reviews.facebook.net/D285

        AFFECTED FILES
        data/files/x.txt
        data/files/y.txt
        data/files/z.txt
        ql/src/test/results/clientpositive/mapjoin_subquery2.q.out
        ql/src/test/queries/clientpositive/mapjoin_subquery2.q
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java
        ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java.orig
        ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java
        ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java.orig
        ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
        ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java

        MANAGE HERALD DIFFERENTIAL RULES
        https://reviews.facebook.net/herald/view/differential/

        WHY DID I GET THIS EMAIL?
        https://reviews.facebook.net/herald/transcript/591/

        Tip: use the X-Herald-Rules header to filter Herald messages in your client.

        Show
        Phabricator added a comment - njain requested code review of " HIVE-2466 [jira] mapjoin_subquery dump small table (mapjoin table) to the same file". Reviewers: JIRA HIVE-2466 diff for review in mapjoin_subquery.q there is a query: SELECT /*+ MAPJOIN(z) */ subq.key1, z.value FROM (SELECT /*+ MAPJOIN<img class="emoticon" src="https://issues.apache.org/jira/images/icons/emoticons/error.gif" height="16" width="16" align="absmiddle" alt="" border="0"/> */ x.key as key1, x.value as value1, y.key as key2, y.value as value2 FROM src1 x JOIN src y ON (x.key = y.key)) subq JOIN srcpart z ON (subq.key1 = z.key and z.ds='2008-04-08' and z.hr=11); when dump x and z to a local file,there all dump to the same file, so we lost the data of x TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D285 AFFECTED FILES data/files/x.txt data/files/y.txt data/files/z.txt ql/src/test/results/clientpositive/mapjoin_subquery2.q.out ql/src/test/queries/clientpositive/mapjoin_subquery2.q ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java.orig ql/src/java/org/apache/hadoop/hive/ql/exec/MapredLocalTask.java ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java.orig ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/plan/HashTableSinkDesc.java ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java ql/src/java/org/apache/hadoop/hive/ql/plan/MapJoinDesc.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/591/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
        Hide
        binlijin added a comment -

        @Namit Jain,
        Thank you very much! I make some changes in hive-2466.3.patch.

        Show
        binlijin added a comment - @Namit Jain, Thank you very much! I make some changes in hive-2466.3.patch.
        Hide
        Namit Jain added a comment -

        A few high level comments:

        Instead of making the dump prefix optional - why dont you always have it in hashtablesinkdesc and
        mapjoindesc.

        This way, you can get rid of all the checks : if dumpdescriptor is not null.
        The logic will be simpler - the names of map files will be :

        mapfile1
        mapfile2
        ..
        etc

        Also, it might be nicer to add the static function in PlanUtils.java instead of QBJoinTree.java.

        Show
        Namit Jain added a comment - A few high level comments: Instead of making the dump prefix optional - why dont you always have it in hashtablesinkdesc and mapjoindesc. This way, you can get rid of all the checks : if dumpdescriptor is not null. The logic will be simpler - the names of map files will be : mapfile1 mapfile2 .. etc Also, it might be nicer to add the static function in PlanUtils.java instead of QBJoinTree.java.
        Hide
        Namit Jain added a comment -

        reviewing now

        Show
        Namit Jain added a comment - reviewing now
        Hide
        binlijin added a comment -

        Add a testcase in hive-2466.2.patch

        Show
        binlijin added a comment - Add a testcase in hive-2466.2.patch
        Hide
        He Yongqiang added a comment -

        can you add a testcase?

        Show
        He Yongqiang added a comment - can you add a testcase?

          People

          • Assignee:
            binlijin
            Reporter:
            binlijin
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development