[HBASE-26340] TableSplit returns false size under 1MB - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.5.0, 3.0.0-alpha-2, 2.4.10
Component/s: mapreduce, regionserver
Labels:
None

Hadoop Flags:

Reviewed

Description

We calculate region size in the mapreduce package by getting the size in MB first and multiplying: https://github.com/apache/hbase/blob/39a20c528e2bf27cedf12734dbdb1b7b1e538076/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java#L87

This will give a size of 0 until at least 1MB is reached. (And it will have an unwanted rounding affect as well).
Spark for example can be tuned to do some performance tuning by eliminating the 0 sized regions. This will eliminate any small regions which are not actually empty. The hadoop interface states the size is returned in bytes, and while this is true do to the multiplication, we multiply by 0 until 1MB is reached. I'm not sure why we get the size in MB units and not in bytes straight up.

Attachments

Issue Links

is related to

IMPALA-11278 Cardinality of small HBase regions is overestimated since HBASE-26340

Open

relates to

HBASE-26609 Round the size to MB or KB at the end of calculation in HRegionServer.createRegionLoad

Resolved

links to

GitHub Pull Request #3737

GitHub Pull Request #3872

Activity

People

Assignee:: Norbert Kalmár

Reporter:: Norbert Kalmár

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 08/Oct/21 09:55

Updated:: 02/May/22 12:40

Resolved:: 18/Dec/21 05:28