Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-13959

Region splitting uses a single thread in most common cases

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      The performance of region splitting has been improved by using a thread pool to split the store files concurrently. Prior to this change, the store files were always split sequentially in a single thread, so a region with multiple store files ended up taking several seconds. The thread pool is sized dynamically with the aim of getting maximum concurrency, without exceeding the number of cores available for HBase Java process. A lower limit for the thread pool can be explicitly set using the property hbase.regionserver.region.split.threads.max.
      Show
      The performance of region splitting has been improved by using a thread pool to split the store files concurrently. Prior to this change, the store files were always split sequentially in a single thread, so a region with multiple store files ended up taking several seconds. The thread pool is sized dynamically with the aim of getting maximum concurrency, without exceeding the number of cores available for HBase Java process. A lower limit for the thread pool can be explicitly set using the property hbase.regionserver.region.split.threads.max.
    • Tags:
      region, split

      Description

      When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially.

      With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException.

      The fix should increase the concurrency of this operation.

        Attachments

        1. 13959-0.98.txt
          4 kB
          Lars Hofhansl
        2. HBASE-13959-5.patch
          4 kB
          Hari Krishna Dara
        3. 13959-suggest.txt
          4 kB
          Lars Hofhansl
        4. region-split-durations-compared.png
          20 kB
          Hari Krishna Dara
        5. HBASE-13959-4.patch
          4 kB
          Hari Krishna Dara
        6. HBASE-13959-3.patch
          4 kB
          Hari Krishna Dara
        7. HBASE-13959-2.patch
          4 kB
          Hari Krishna Dara
        8. HBASE-13959.patch
          5 kB
          Hari Krishna Dara

          Activity

            People

            • Assignee:
              haridsv Hari Krishna Dara
              Reporter:
              haridsv Hari Krishna Dara
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: