Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14926

Hung ThriftServer; no timeout on read from client; if client crashes, worker thread gets stuck reading

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 1.1.2, 1.3.0, 1.0.3, 0.98.16, 2.0.0
    • Fix Version/s: 1.2.0, 1.3.0, 0.98.17, 2.0.0
    • Component/s: Thrift
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Adds a timeout to server read from clients. Adds new configs hbase.thrift.server.socket.read.timeout for setting read timeout on server socket in milliseconds. Default is 60000;

      Description

      Thrift server is hung. All worker threads are doing this:

      "thrift-worker-0" daemon prio=10 tid=0x00007f0bb95c2800 nid=0xf6a7 runnable [0x00007f0b956e0000]
         java.lang.Thread.State: RUNNABLE
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.read(SocketInputStream.java:152)
              at java.net.SocketInputStream.read(SocketInputStream.java:122)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
              at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
              - locked <0x000000066d859490> (a java.io.BufferedInputStream)
              at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
              at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
              at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
              at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
              at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
              at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
              at org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)
              at org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      

      They never recover.

      I don't have client side logs.

      We've been here before: HBASE-4967 "connected client thrift sockets should have a server side read timeout" but this patch only got applied to fb branch (and thrift has changed since then).

        Attachments

        1. 14926.patch
          11 kB
          stack
        2. 14926v2.txt
          12 kB
          stack

          Activity

            People

            • Assignee:
              stack stack
              Reporter:
              stack stack
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: