Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27768

Race conditions in BlockingRpcConnection

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0, 2.5.5, 2.4.18
    • None

    Description

      We've been experiencing strange timeouts since upgrading to hbase2 client. We use BlockingRpcConnection for now until we migrate our auth stack to native TLS. In diagnosing the timeouts, I noticed a few issues in this class:

      1. Most importantly, there is a race condition which can result in a case where a BlockingRpcConnection instance has 2 reader threads running. In this case, both are competing for the socket and it causes weird timeouts and in some cases corrupted response (i.e. InvalidProtocolBufferException)
      2. The waitForWork loop does not properly handle interruption. When it gets interrupted, if the above race condition occurs, the waitForWork loop ends up forever being in a tight loop. The "wait()" call instantly throws InterruptedException, and we set interrupted state back and restart the loop. So no waiting is occurring anymore.

      The race condition is somewhat rare, only occurring in certain failure scenarios on our highest volume clients. But when it happens, a low level of errors will forever be thrown for the affected server connection until the client is bounced.

      Attachments

        Issue Links

          Activity

            People

              bbeaudreault Bryan Beaudreault
              bbeaudreault Bryan Beaudreault
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: