Hive
  1. Hive
  2. HIVE-2284

bucketized map join should allow join key as a superset of bucketized columns

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently bucketized mapjoin only allow the join keys being exactly the same as bucketized columns. This is too restrictive and is missing some optimization opportunities.

      If tables S and T are both bucketized on column A with the same # of buckets, and the query is something like:

      <code>
      SELECT /*+ MAPJOIN (S) */ ...
      FROM S join T
      ON (S.A = T.A AND S.B = T.B)
      <code>

      We should allow bucketized mapjoin since it's straightforward that bucket 1 from S join with bucket 2 from T on such join condition must be empty.

        Activity

        Show
        Ning Zhang added a comment - https://reviews.apache.org/r/1136/
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/1136/
        -----------------------------------------------------------

        Review request for hive and namit jain.

        Summary
        -------

        Allow bucketed mapjoin if join key is a superset of bucket columns.

        This addresses bug HIVE-2284.
        https://issues.apache.org/jira/browse/HIVE-2284

        Diffs


        trunk/contrib/build.xml 1146922
        trunk/eclipse-templates/.classpath 1146922
        trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java 1146922
        trunk/ql/src/test/queries/clientpositive/smb_mapjoin_10.q PRE-CREATION
        trunk/ql/src/test/results/clientpositive/smb_mapjoin_10.q.out PRE-CREATION

        Diff: https://reviews.apache.org/r/1136/diff

        Testing
        -------

        passed all unit tests.

        Thanks,

        Ning

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1136/ ----------------------------------------------------------- Review request for hive and namit jain. Summary ------- Allow bucketed mapjoin if join key is a superset of bucket columns. This addresses bug HIVE-2284 . https://issues.apache.org/jira/browse/HIVE-2284 Diffs trunk/contrib/build.xml 1146922 trunk/eclipse-templates/.classpath 1146922 trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java 1146922 trunk/ql/src/test/queries/clientpositive/smb_mapjoin_10.q PRE-CREATION trunk/ql/src/test/results/clientpositive/smb_mapjoin_10.q.out PRE-CREATION Diff: https://reviews.apache.org/r/1136/diff Testing ------- passed all unit tests. Thanks, Ning
        Hide
        Namit Jain added a comment -

        I didnt understand the changes in build.xml

        Why is TestContribCliDriver removed ?

        Show
        Namit Jain added a comment - I didnt understand the changes in build.xml Why is TestContribCliDriver removed ?
        Hide
        Namit Jain added a comment -

        Committed. Thanks Ning

        Show
        Namit Jain added a comment - Committed. Thanks Ning
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #829 (See https://builds.apache.org/job/Hive-trunk-h0.21/829/)
        HIVE-2284 Bucketized map join should allow join key as a superset of
        bucketized columns (Ning Zhang via namit)

        namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1147364
        Files :

        • /hive/trunk/contrib/build.xml
        • /hive/trunk/eclipse-templates/.classpath
        • /hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_10.q.out
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java
        • /hive/trunk/ql/src/test/queries/clientpositive/smb_mapjoin_10.q
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #829 (See https://builds.apache.org/job/Hive-trunk-h0.21/829/ ) HIVE-2284 Bucketized map join should allow join key as a superset of bucketized columns (Ning Zhang via namit) namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1147364 Files : /hive/trunk/contrib/build.xml /hive/trunk/eclipse-templates/.classpath /hive/trunk/ql/src/test/results/clientpositive/smb_mapjoin_10.q.out /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapJoinOptimizer.java /hive/trunk/ql/src/test/queries/clientpositive/smb_mapjoin_10.q

          People

          • Assignee:
            Ning Zhang
            Reporter:
            Ning Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development