Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2654

Zookeeper dependent services should not depend on Connectionstate to be valid before cleaning up

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.2.0
    • Fix Version/s: 5.0.0b1, 4.3.1
    • Component/s: HA
    • Labels:
      None

      Description

      Currently in ZKUtils, ZKLocks and ZKJobsConcurrency services, we don't properly teardown the zookeeper connections when the callback was not received from zookeeper to change the connection state.

      We can get into this situation if the ZK session for example was closed by ZK before any callback was received to update the connection state. This can cause the oozie server in a HA mode to not terminate with one or more sockets in close_wait state.

      Here is an instance of this issue

      From the network connections, we have one connection still on close_wait with indefinite wait.

      tcp6 143 0 x.x.x.1:46710 x.x.x.2:2181 CLOSE_WAIT 4688/java off (0.00/0/0)

      From the zookeeper logs,

      016-08-18 20:45:29,921 - INFO NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868 - Client attempting to establish new session at /x.x.x.1:46710 2016-08-18 20:45:29,926 - INFO CommitProcessor:1:ZooKeeperServer@617 - Established session 0x1569f576843000e with negotiated timeout 40000 for client /x.x.x.1:46710

      and later

      2016-08-18 20:46:34,008 - INFO CommitProcessor:1:NIOServerCnxn@1007 - Closed socket connection for client /x.x.x.1:46710 which had sessionid 0x1569f576843000e

      The fix is to not check for the connectionstate during service destroy and teardown the zk connections.

        Attachments

        1. OOZIE-2654.diff
          2 kB
          Venkat Ranganathan

          Activity

            People

            • Assignee:
              venkatnrangan Venkat Ranganathan
              Reporter:
              venkatnrangan Venkat Ranganathan
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: