Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2284

bucketized map join should allow join key as a superset of bucketized columns

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently bucketized mapjoin only allow the join keys being exactly the same as bucketized columns. This is too restrictive and is missing some optimization opportunities.

      If tables S and T are both bucketized on column A with the same # of buckets, and the query is something like:

      <code>
      SELECT /*+ MAPJOIN (S) */ ...
      FROM S join T
      ON (S.A = T.A AND S.B = T.B)
      <code>

      We should allow bucketized mapjoin since it's straightforward that bucket 1 from S join with bucket 2 from T on such join condition must be empty.

        Attachments

          Activity

            People

            • Assignee:
              nzhang Ning Zhang
              Reporter:
              nzhang Ning Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: