Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7084

Raise default minimum region split size

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Duplicate
    • Affects Version/s: 0.95.2
    • Fix Version/s: None
    • Component/s: regionserver
    • Labels:
      None

      Description

      Several times I've seen folks mentioning to raise region split sizes from the default 256m to something more suitable (mostly folks say 1 GB, sometimes 2 GB, more than other options), in order to control the # of regions explosion once they begin heavy-hitting the tables.

      Perhaps it makes sense to raise the default itself, since there are ways to bring it down per-table if needed by some use-cases?

      Opening this as a discussion first, since this "too many regions" trouble is quite prevalent among newcomers.

        Activity

        Hide
        kevin.odell Kevin Odell added a comment -

        +1, I think with .90 being an exception as opposed to a rule these days we should use HFilev2 and set the default to something like 2GB and then have our documentation discuss splitting for smaller workloads. It is much easier to split then it is to merge.

        Show
        kevin.odell Kevin Odell added a comment - +1, I think with .90 being an exception as opposed to a rule these days we should use HFilev2 and set the default to something like 2GB and then have our documentation discuss splitting for smaller workloads. It is much easier to split then it is to merge.
        Hide
        kevin.odell Kevin Odell added a comment -

        use HFilev2 sounds weird. I mean for our region starting point we don't have to worry about HFilev1. For larger environments running .92+ we usually recommend between 10 - 20GB to keep the region count down.

        Show
        kevin.odell Kevin Odell added a comment - use HFilev2 sounds weird. I mean for our region starting point we don't have to worry about HFilev1. For larger environments running .92+ we usually recommend between 10 - 20GB to keep the region count down.
        Hide
        qwertymaniac Harsh J added a comment -

        Thanks Kevin! I wouldn't go so high though for defaults, since there are users who don't mind a few tens or a hundred of regions of a single table per RS, just for parallelism (over fragmented files - better availability in face of per-region failure, if any), and of course - that is also data size dependent.

        Show
        qwertymaniac Harsh J added a comment - Thanks Kevin! I wouldn't go so high though for defaults, since there are users who don't mind a few tens or a hundred of regions of a single table per RS, just for parallelism (over fragmented files - better availability in face of per-region failure, if any), and of course - that is also data size dependent.
        Hide
        kevin.odell Kevin Odell added a comment -

        You think 1GB would be a better starting point? It would save us 4x regions on the runaway tables.

        Show
        kevin.odell Kevin Odell added a comment - You think 1GB would be a better starting point? It would save us 4x regions on the runaway tables.
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        Take a look at HBASE-4365, this is mostly already done.

        Show
        jdcryans Jean-Daniel Cryans added a comment - Take a look at HBASE-4365 , this is mostly already done.
        Hide
        qwertymaniac Harsh J added a comment -

        Thanks JD, resolved as dupe.

        Show
        qwertymaniac Harsh J added a comment - Thanks JD, resolved as dupe.

          People

          • Assignee:
            Unassigned
            Reporter:
            qwertymaniac Harsh J
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development