Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.4, 1.0.4
    • Performance
    • None
    • Reviewed
    • Hide
      Fixes performance regression in increment/append and checkAnd* operations in hbase-1.0.x and hbase-1.1.x. This fix is not needed in hbase-1.2 and on up. They have HBASE-12751 which does effectively the same thing.
      Show
      Fixes performance regression in increment/append and checkAnd* operations in hbase-1.0.x and hbase-1.1.x. This fix is not needed in hbase-1.2 and on up. They have HBASE-12751 which does effectively the same thing.

    Description

      This is an attempt to fix the increment performance regression caused by HBASE-8763 on branch-1.0.

      I'm aware that hbase.increment.fast.but.narrow.consistency was added to branch-1.0 (HBASE-15031) to address the issue and a separate work is ongoing on master branch, but anyway, this is my take on the problem.

      I read through HBASE-14460 and HBASE-8763 but it wasn't clear to me what caused the slowdown but I could indeed reproduce the performance regression.

      Test setup:

      • Server: 4-core Xeon 2.4GHz Linux server running mini cluster (100 handlers, JDK 1.7)
      • Client: Another box of the same spec
      • Increments on random 10k records on a single-region table, recreated every time

      Increment throughput (TPS):

      Num threads Before HBASE-8763 (d6cc2fb) branch-1.0 branch-1.0 (narrow-consistency)
      1 2661 2486 2359
      2 5048 5064 4867
      4 7503 8071 8690
      8 10471 10886 13980
      16 15515 9418 18601
      32 17699 5421 20540
      64 20601 4038 25591
      96 19177 3891 26017

      We can clearly observe that the throughtput degrades as we increase the number of concurrent requests, which led me to believe that there's severe context switching overhead and I could indirectly confirm that suspicion with cs entry in vmstat output. branch-1.0 shows a much higher number of context switches even with much lower throughput.

      Here are the observations:

      • WriteEntry in the writeQueue can only be removed by the very handler that put it, only when it is at the front of the queue and marked complete.
      • Since a WriteEntry is marked complete after the wait-loop, only one entry can be removed at a time.
      • This stringent condition causes O(N^2) context switches where n is the number of concurrent handlers processing requests.

      So what I tried here is to mark WriteEntry complete before we go into wait-loop. With the change, multiple WriteEntries can be shifted at a time without context switches. I changed writeQueue to LinkedHashSet since fast containment check is needed as WriteEntry can be removed by any handler.

      The numbers look good, it's virtually identical to pre-HBASE-8763 era.

      Num threads branch-1.0 with fix
      1 2459
      2 4976
      4 8033
      8 12292
      16 15234
      32 16601
      64 19994
      96 20052

      So what do you think about it? Please let me know if I'm missing anything.

      Attachments

        1. HBASE-15213.branch-1.0.patch
          4 kB
          Junegunn Choi
        2. HBASE-15213-increment.png
          79 kB
          Junegunn Choi
        3. HBASE-15213.v1.branch-1.0.patch
          4 kB
          Junegunn Choi
        4. 15157v3.branch-1.1.patch
          4 kB
          Michael Stack

        Issue Links

          Activity

            People

              junegunn Junegunn Choi
              junegunn Junegunn Choi
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: