Details
Description
It has been observed that when the table has too many regions, MR jobs consume a lot of memory in the client. This is because we keep the region level information in memory and the memory heavy object is TableSplit because of the Scan object as a part of it.
However, it looks like the TableInputFormat for single table doesn't need to store the scan object in the TableSplit because we do not use it and all the splits are expected to have the exact same scan object. In TableInputFormat we use the scan object directly from the MR conf.
Attachments
Attachments
Issue Links
- is a parent of
-
HBASE-25226 Optimize in-memory representation for HBase map reduce table splits for MultiTableInputFormat
- Open
- links to