Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26340

TableSplit returns false size under 1MB

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      We calculate region size in the mapreduce package by getting the size in MB first and multiplying: https://github.com/apache/hbase/blob/39a20c528e2bf27cedf12734dbdb1b7b1e538076/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RegionSizeCalculator.java#L87

      This will give a size of 0 until at least 1MB is reached. (And it will have an unwanted rounding affect as well).
      Spark for example can be tuned to do some performance tuning by eliminating the 0 sized regions. This will eliminate any small regions which are not actually empty. The hadoop interface states the size is returned in bytes, and while this is true do to the multiplication, we multiply by 0 until 1MB is reached. I'm not sure why we get the size in MB units and not in bytes straight up.

      Attachments

        Issue Links

          Activity

            People

              nkalmar Norbert Kalmár
              nkalmar Norbert Kalmár
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: