Hive
  1. Hive
  2. HIVE-1134

bucketing mapjoin where the big table contains more than 1 big partition

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.6.0
    • Component/s: Query Processor
    • Labels:
    • Hadoop Flags:
      Reviewed
    1. hive-1134-2010-02-20.patch
      304 kB
      He Yongqiang
    2. hive-1134-2010-02-19.patch
      303 kB
      He Yongqiang
    3. hive-1134-2010-02-18.patch
      309 kB
      He Yongqiang
    4. hive-1134-2010-02-17.patch
      211 kB
      He Yongqiang

      Issue Links

        Activity

        Hide
        Namit Jain added a comment -

        Some pending work from https://issues.apache.org/jira/browse/HIVE-917 - you can do that in separate jira if you want to.

        1. Add the mapping in explain plan so that it can be compared - look at
        https://issues.apache.org/jira/browse/HIVE-976

        2. Add a negative test - the number of buckets in the 2 tables are not exact multiples of each other.
        I mean, bucketed map join will not be used.

        3. Instead of checking at runtime, set the defultbucketmatcher in the plan and initialize it using reflection

        Show
        Namit Jain added a comment - Some pending work from https://issues.apache.org/jira/browse/HIVE-917 - you can do that in separate jira if you want to. 1. Add the mapping in explain plan so that it can be compared - look at https://issues.apache.org/jira/browse/HIVE-976 2. Add a negative test - the number of buckets in the 2 tables are not exact multiples of each other. I mean, bucketed map join will not be used. 3. Instead of checking at runtime, set the defultbucketmatcher in the plan and initialize it using reflection
        Hide
        He Yongqiang added a comment -

        The attached patch also fixed a bug in Hive-917 's patch

        Should use MOD instead of Div
        // if the big table has more buckets than the current small table,
        // use "MOD" to get small table bucket names. For example, if the big
        // table has 4 buckets and the small table has 2 buckets, then the
        // mapping should be 0->0, 1->1, 2->0, 3->1.

        Show
        He Yongqiang added a comment - The attached patch also fixed a bug in Hive-917 's patch Should use MOD instead of Div // if the big table has more buckets than the current small table, // use "MOD" to get small table bucket names. For example, if the big // table has 4 buckets and the small table has 2 buckets, then the // mapping should be 0->0, 1->1, 2->0, 3->1.
        Hide
        He Yongqiang added a comment -

        Fixed some diff. Thanks Namit.

        Show
        He Yongqiang added a comment - Fixed some diff. Thanks Namit.
        Hide
        Namit Jain added a comment -

        Few minor comments:

        1. CheckStyle - lot of code needs { }

        For eg:

        if (!checkBucketNumberAgainstBigTable(aliasToBucketNumberMapping,
        bucketNumberInPart))
        return null;

        2. Modify existing tests to run the test without the hint and then compare the results.

        3. Cleanup GenMapredUtils.setupBucketMapJoinInfo

        Show
        Namit Jain added a comment - Few minor comments: 1. CheckStyle - lot of code needs { } For eg: if (!checkBucketNumberAgainstBigTable(aliasToBucketNumberMapping, bucketNumberInPart)) return null; 2. Modify existing tests to run the test without the hint and then compare the results. 3. Cleanup GenMapredUtils.setupBucketMapJoinInfo
        Hide
        He Yongqiang added a comment -

        Integrated Namit's comments. Thanks Namit.

        Show
        He Yongqiang added a comment - Integrated Namit's comments. Thanks Namit.
        Hide
        Namit Jain added a comment -

        The changes look good - but there is a problem with the test.
        I did not debug, but it seems that you are not deleting the
        table bucketmapjoin_has_result1 and 2 in some test - because
        of which the tests input2/3 are failing intermittently (depending on the
        order of tests).

        Can you update the tests ?

        Show
        Namit Jain added a comment - The changes look good - but there is a problem with the test. I did not debug, but it seems that you are not deleting the table bucketmapjoin_has_result1 and 2 in some test - because of which the tests input2/3 are failing intermittently (depending on the order of tests). Can you update the tests ?
        Hide
        He Yongqiang added a comment -

        Thanks Namit. Updated the patch.

        Show
        He Yongqiang added a comment - Thanks Namit. Updated the patch.
        Hide
        Namit Jain added a comment -

        Committed. Thanks Yongqiang

        Show
        Namit Jain added a comment - Committed. Thanks Yongqiang

          People

          • Assignee:
            He Yongqiang
            Reporter:
            Namit Jain
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development