HBase
  1. HBase
  2. HBASE-420

Adjacent small regions should be automatically merged

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Not a Problem
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: master, regionserver
    • Labels:
      None

      Description

      Region merge functionality exists in HBase today, but merges are triggered manually (in theory only, because there is no admin tool for doing so). Instead of relying on an admin to note and merge regions, the Master should detect adjacent undersized regions and automatically merge them.

      Other than the case when a table has exactly one region, region sizes should always be between 1/2x and 1x the split size. For instance, if the max file size is 256MB, steady-state, regions will be between 128 and 256MB. If we find two regions near each other that are less than some threshold when summed together, they are candidates for merging. For instance, we could set the threshold to 1/2x max file size, so if one region was 50MB and the other was 16MB, they would be mergeable.

      The only time that regions small enough to merge should exist is when there have been significant deletions. Otherwise, regions will always stay in the 1/2 to 1x range.

        Issue Links

          Activity

          Hide
          Thibaut added a comment -

          This sounds reasonable.

          I have a table with data that needs to be processed with tcurrent imestamps used as key. When the data in the table is being processed (after 30 minutes), all entries are processed and then deleted.

          When the data in the table grows over two regions, all regions except the last one will never be used again for future data and will stay empty for ever because the new Timestamps will be never added to that region

          Show
          Thibaut added a comment - This sounds reasonable. I have a table with data that needs to be processed with tcurrent imestamps used as key. When the data in the table is being processed (after 30 minutes), all entries are processed and then deleted. When the data in the table grows over two regions, all regions except the last one will never be used again for future data and will stay empty for ever because the new Timestamps will be never added to that region
          Hide
          Ferdy Galema added a comment -
          Show
          Ferdy Galema added a comment - Is this not a duplicate of https://issues.apache.org/jira/browse/HBASE-1621 ?
          Hide
          Jean-Daniel Cryans added a comment -

          1621 is a lesser version of 420, this jira is about the "perfect" solution while the former is about just disabling the table without taking down the whole cluster.

          Show
          Jean-Daniel Cryans added a comment - 1621 is a lesser version of 420, this jira is about the "perfect" solution while the former is about just disabling the table without taking down the whole cluster.
          Hide
          Andrew Purtell added a comment -

          Other issues cover this in various ways. Let's retire this golden oldie.

          Show
          Andrew Purtell added a comment - Other issues cover this in various ways. Let's retire this golden oldie.

            People

            • Assignee:
              Unassigned
              Reporter:
              Bryan Duxbury
            • Votes:
              5 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development