Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
In bucket mapjoin, each task loads its own copy of hash table which is inefficient as load is IO heavy and due to multiple copies of same hash table, the tables may get GCed on a busy system.
Implement a subcache with softreference to each hash table corresponding to its bucketID such that it can be reused by a task.
This needs changes from Tez side to push bucket id to TezProcessor.
Attachments
Attachments
Issue Links
- links to