Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17848

Bucket Map Join : Implement an efficient way to minimize loading hash table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      In bucket mapjoin, each task loads its own copy of hash table which is inefficient as load is IO heavy and due to multiple copies of same hash table, the tables may get GCed on a busy system.
      Implement a subcache with softreference to each hash table corresponding to its bucketID such that it can be reused by a task.

      This needs changes from Tez side to push bucket id to TezProcessor.

      Attachments

        1. HIVE-17848.7.patch
          33 kB
          Deepak Jaiswal
        2. HIVE-17848.6.patch
          33 kB
          Deepak Jaiswal
        3. HIVE-17848.5.patch
          32 kB
          Deepak Jaiswal
        4. HIVE-17848.4.patch
          32 kB
          Deepak Jaiswal
        5. HIVE-17848.2.patch
          13 kB
          Deepak Jaiswal

        Issue Links

          Activity

            People

              djaiswal Deepak Jaiswal
              djaiswal Deepak Jaiswal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: