Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-1158

Introducing a new parameter for Map-side join bucket size

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.5.0, 0.6.0
    • 0.5.0
    • None
    • None
    • Reviewed
    • Hide
      HIVE-1158. Introducing a new parameter for Map-side join bucket size. (Ning Zhang via zshao)
      Show
      HIVE-1158 . Introducing a new parameter for Map-side join bucket size. (Ning Zhang via zshao)

    Description

      Map-side join cache the small table in memory and join with the split of the large table at the mapper side. If the small table is too large, it uses RowContainer to cache a number of rows indicated by parameter hive.join.cache.size, whose default value is 25000. This parameter is also used for regular reducer-side joins to cache all input tables except the streaming table. This default value is too large for map-side join bucket size, resulting in OOM exceptions sometimes. We should define a different parameter to separate these two cache sizes.

      Attachments

        1. HIVE-1158_branch_0_5.patch
          12 kB
          Ning Zhang
        2. HIVE-1158.patch
          4 kB
          Ning Zhang

        Activity

          People

            nzhang Ning Zhang
            nzhang Ning Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: