Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2971

On cluster startup, master/rs connect to ZK before it's fully ready causing a ConnectionLossException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.90.0
    • 0.90.0
    • Zookeeper
    • None

    Description

      There is a race condition that has existed but has been glossed over to this point (because of our "loose" zk usage).

      The ZK server process can be in a state where it will accept the socket connection from our client in master or RS but if we do anything against the server, we get a ConnectionLossException. The ZK client handles this automagically and reconnects properly, as long as we are not aborting when we get this exception.

      So this works on the last 0.89 and even with the master rewrite, but as we move towards strict usage of ZK, we should wait for ZK availability before proceeding with startup.

      I already have a patch in a local branch and it's working. Will put up a patch soon against new master.

      Attachments

        Activity

          People

            streamy Jonathan Gray
            streamy Jonathan Gray
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: