HBase
  1. HBase
  2. HBASE-9410

Concurrent coprocessor endpoint executions slow down exponentially

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.94.11
    • Fix Version/s: None
    • Component/s: Coprocessors
    • Labels:
      None
    • Environment:

      Amazon ec2

      Description

      Multiple concurrent executions of coprocessor endpoints slow down drastically. It is compounded further when there are more Htable connection setups happening.

      1. SearchProtocol.java
        0.3 kB
        Kirubakaran Pakkirisamy
      2. SearchEndpoint.java
        0.5 kB
        Kirubakaran Pakkirisamy
      3. Search.java
        2 kB
        Kirubakaran Pakkirisamy
      4. jstack3.log
        51 kB
        Kirubakaran Pakkirisamy
      5. jstack2.log
        51 kB
        Kirubakaran Pakkirisamy
      6. jstack1.log
        51 kB
        Kirubakaran Pakkirisamy
      7. jstack.log
        7.11 MB
        Kirubakaran Pakkirisamy

        Activity

        Kirubakaran Pakkirisamy created issue -
        Hide
        Kirubakaran Pakkirisamy added a comment -

        Attached files which demonstrate the problem. The Thread.sleep in the client allows for clients to have created the HTable. It then loops for say, 50 times. What is usually 10-20msec suddenly jumps to few hundreds and some in thousands. This is with 32 conncurrent connections to a 4 node ec2 cluster with 32 cores in total

        Show
        Kirubakaran Pakkirisamy added a comment - Attached files which demonstrate the problem. The Thread.sleep in the client allows for clients to have created the HTable. It then loops for say, 50 times. What is usually 10-20msec suddenly jumps to few hundreds and some in thousands. This is with 32 conncurrent connections to a 4 node ec2 cluster with 32 cores in total
        Kirubakaran Pakkirisamy made changes -
        Field Original Value New Value
        Attachment Search.java [ 12600990 ]
        Attachment SearchEndpoint.java [ 12600991 ]
        Attachment SearchProtocol.java [ 12600992 ]
        Hide
        Kirubakaran Pakkirisamy added a comment -

        Attaching jstack output at one run done for taking the jstack output

        Show
        Kirubakaran Pakkirisamy added a comment - Attaching jstack output at one run done for taking the jstack output
        Kirubakaran Pakkirisamy made changes -
        Attachment jstack.log [ 12600995 ]
        Hide
        Andrew Purtell added a comment -

        Looks like you configured 1000 IPC handlers? This should be set to approximately the number of cores and spindles of the server hardware.

        Scrolling through some of that 7 MB log (can you attach only one jstack you think is relevant?) I don't see any IPC handlers doing any work, but it's too big to look at in total.

        Show
        Andrew Purtell added a comment - Looks like you configured 1000 IPC handlers? This should be set to approximately the number of cores and spindles of the server hardware. Scrolling through some of that 7 MB log (can you attach only one jstack you think is relevant?) I don't see any IPC handlers doing any work, but it's too big to look at in total.
        Hide
        Kirubakaran Pakkirisamy added a comment -

        Andrew, I have set the rpc handler count to 10, the default and re ran the test case. I have attached 3 jstacks taken during the run.

        Show
        Kirubakaran Pakkirisamy added a comment - Andrew, I have set the rpc handler count to 10, the default and re ran the test case. I have attached 3 jstacks taken during the run.
        Kirubakaran Pakkirisamy made changes -
        Attachment jstack3.log [ 12601010 ]
        Attachment jstack2.log [ 12601011 ]
        Attachment jstack1.log [ 12601012 ]
        Hide
        Kirubakaran Pakkirisamy added a comment -

        Ofcourse the latency is affected by the number of regions of the table and the total # of regions being used on the server. I expect a degradation, but not exponential.

        Show
        Kirubakaran Pakkirisamy added a comment - Ofcourse the latency is affected by the number of regions of the table and the total # of regions being used on the server. I expect a degradation, but not exponential.
        Hide
        Kirubakaran Pakkirisamy added a comment -

        Andrew et al, were you able to recreate the problem with the attached test case ? It just needs a table with as many as regions as (atleast) the total number of cores in the cluster. It could be any table and I dont think the region size matters. Seems like some unnecessary/aggressive locking overhead.

        Show
        Kirubakaran Pakkirisamy added a comment - Andrew et al, were you able to recreate the problem with the attached test case ? It just needs a table with as many as regions as (atleast) the total number of cores in the cluster. It could be any table and I dont think the region size matters. Seems like some unnecessary/aggressive locking overhead.
        Hide
        Kirubakaran Pakkirisamy added a comment -

        I do not see this issue in 0.95.2

        Show
        Kirubakaran Pakkirisamy added a comment - I do not see this issue in 0.95.2
        Hide
        Lars Hofhansl added a comment -

        Nothing jumps out there. Can you do a jstack on the client as well while the test is running?

        Show
        Lars Hofhansl added a comment - Nothing jumps out there. Can you do a jstack on the client as well while the test is running?

          People

          • Assignee:
            Unassigned
            Reporter:
            Kirubakaran Pakkirisamy
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:

              Development