Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-17798

RpcServer.Listener.Reader can abort due to CancelledKeyException

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0, 1.2.4, 0.98.24, 2.0.0
    • Fix Version/s: 1.4.0, 1.3.3, 2.0.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      In our production cluster(0.98), some of the requests were unacceptable because RpcServer.Listener.Reader were aborted.
      getReader() will return the next reader to deal with request.
      The implementation of getReader() as below´╝Ü

      RpcServer.java
          // The method that will return the next reader to work with
          // Simplistic implementation of round robin for now
          Reader getReader() {
            currentReader = (currentReader + 1) % readers.length;
            return readers[currentReader];
          }
      

      If one of the readers abort, then it will lead to fall on the reader's request will never be dealt with.
      Why does RpcServer.Listener.Reader abort?We add the debug log to get it.
      After a while, we got the following exception:

      2017-03-10 08:05:13,247 ERROR [RpcServer.reader=3,port=60020] ipc.RpcServer: RpcServer.listener,port=60020: unexpectedly error in Reader(Throwable)
      java.nio.channels.CancelledKeyException
              at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
              at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87)
              at java.nio.channels.SelectionKey.isReadable(SelectionKey.java:289)
              at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:592)
              at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:566)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)
      

      So, when deal with the request in reader, we should handle CanceledKeyException.

      ----------
      versions 1.x and 2.0 will log and retrun when dealing with the InterruptedException in Reader#doRunLoop after HBASE-10521. It will lead to the same problem.

        Attachments

        1. 17798-master-v2.patch
          0.9 kB
          Ted Yu
        2. connections.png
          20 kB
          Guangxu Cheng
        3. HBASE-17798-master-v2.patch
          0.9 kB
          Guangxu Cheng
        4. HBASE-17798-branch-1-v2.patch
          0.8 kB
          Guangxu Cheng
        5. HBASE-17798-0.98-v2.patch
          0.8 kB
          Guangxu Cheng
        6. HBASE-17798-master-v1.patch
          0.8 kB
          Guangxu Cheng
        7. HBASE-17798-branch-1-v1.patch
          0.8 kB
          Guangxu Cheng
        8. HBASE-17798-0.98-v1.patch
          0.8 kB
          Guangxu Cheng

          Activity

            People

            • Assignee:
              andrewcheng Guangxu Cheng
              Reporter:
              andrewcheng Guangxu Cheng
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: