Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-2284

bucketized map join should allow join key as a superset of bucketized columns

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.8.0
    • None
    • None
    • Reviewed

    Description

      Currently bucketized mapjoin only allow the join keys being exactly the same as bucketized columns. This is too restrictive and is missing some optimization opportunities.

      If tables S and T are both bucketized on column A with the same # of buckets, and the query is something like:

      <code>
      SELECT /*+ MAPJOIN (S) */ ...
      FROM S join T
      ON (S.A = T.A AND S.B = T.B)
      <code>

      We should allow bucketized mapjoin since it's straightforward that bucket 1 from S join with bucket 2 from T on such join condition must be empty.

      Attachments

        1. HIVE-2284.patch
          11 kB
          Ning Zhang

        Activity

          People

            nzhang Ning Zhang
            nzhang Ning Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: