Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6920

On timeout connecting to master, client can get stuck and never make progress

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.94.2
    • Fix Version/s: 0.94.2
    • Component/s: None
    • Labels:
      None

      Description

      HBASE-5058 appears to have introduced an issue where a timeout in HConnection.getMaster() can cause the client to never be able to connect to the master. So, for example, an HBaseAdmin object can never successfully be initialized.

      The issue is here:

      if (tryMaster.isMasterRunning()) {
        this.master = tryMaster;
        this.masterLock.notifyAll();
        break;
      }
      

      If isMasterRunning times out, it throws an UndeclaredThrowableException, which is already not ideal, because it can be returned to the application.

      But if the first call to getMaster succeeds, it will set masterChecked = true, which makes us never try to reconnect; that is, we will set this.master = null and just throw MasterNotRunningExceptions, without even trying to connect.

      I tried out a 94 client (actually a 92 client with some 94 patches) on a cluster with some network issues, and it would constantly get stuck as described above.

        Attachments

        1. 6920-addendum.txt
          2 kB
          Lars Hofhansl
        2. HBASE-6920.patch
          13 kB
          Gregory Chanan
        3. HBASE-6920-v2.patch
          13 kB
          Gregory Chanan

          Issue Links

            Activity

              People

              • Assignee:
                gchanan Gregory Chanan
                Reporter:
                gchanan Gregory Chanan
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: