HBase
  1. HBase
  2. HBASE-4365

Add a decent heuristic for region size

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.92.1, 0.94.0
    • Fix Version/s: 0.94.0
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Changes default splitting policy from ConstantSizeRegionSplitPolicy to IncreasingToUpperBoundRegionSplitPolicy. Splits quickly initially slowing as the number of regions climbs.

      Split size is the number of regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller. For example, if the flush size is 128M, then on first flush we will split which will make two regions that will split when their size is 2 * 2 * 128M = 512M. If one of these regions splits, then there are three regions and now the split size is 3 * 3 * 128M = 1152M, and so on until we reach the configured maximum filesize and then from there on out, we'll use that.

      Be warned, this new default could bring on lots of splits if you have many tables on your cluster. Either go back to to the old split policy or up the lower bound configuration.

      This patch changes the default split size from 64M to 128M. It makes the region eventual split size, hbase.hregion.max.filesize, 10G (It was 1G).
      Show
      Changes default splitting policy from ConstantSizeRegionSplitPolicy to IncreasingToUpperBoundRegionSplitPolicy. Splits quickly initially slowing as the number of regions climbs. Split size is the number of regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller. For example, if the flush size is 128M, then on first flush we will split which will make two regions that will split when their size is 2 * 2 * 128M = 512M. If one of these regions splits, then there are three regions and now the split size is 3 * 3 * 128M = 1152M, and so on until we reach the configured maximum filesize and then from there on out, we'll use that. Be warned, this new default could bring on lots of splits if you have many tables on your cluster. Either go back to to the old split policy or up the lower bound configuration. This patch changes the default split size from 64M to 128M. It makes the region eventual split size, hbase.hregion.max.filesize, 10G (It was 1G).

      Description

      A few of us were brainstorming this morning about what the default region size should be. There were a few general points made:

      • in some ways it's better to be too-large than too-small, since you can always split a table further, but you can't merge regions currently
      • with HFile v2 and multithreaded compactions there are fewer reasons to avoid very-large regions (10GB+)
      • for small tables you may want a small region size just so you can distribute load better across a cluster
      • for big tables, multi-GB is probably best
      1. 4365.txt
        12 kB
        stack
      2. 4365-v2.txt
        12 kB
        stack
      3. 4365-v3.txt
        14 kB
        stack
      4. 4365-v4.txt
        16 kB
        stack
      5. 4365-v5.txt
        16 kB
        stack

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              stack
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development