Uploaded image for project: 'Apache Curator'
  1. Apache Curator
  2. CURATOR-367

Curator may deliver RECONNECTED before LOST in case of session expiry

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.11.1
    • 2.12.0
    • Client
    • None

    Description

      Behaviour:

      We saw our code blocked at client.blockUntilConnected() after reconnected on session expiry.

      Possible reason:

      After receiving a session expired event ConnectionState first resets the connection then notifies the parent watchers, where the CuratorEvent is created. In this case it seems the execution of the first zookeeper event thread was delayed before calling the parent watchers. Meanwhile a new zookeeper event thread was created due to calling reset and this new thread sent the SyncConnected event earlier than SessionExpired was sent to parent watchers in the first thread. This resulted ConnectionStateListener instances seeing the RECONNECTED before the LOST.

      Logs:

      2016-11-17T20:23:28.527Z [Thread-0-SendThread(] INFO ClientCnxn: Opening socket connection to server _
      2016-11-17T20:23:28.535Z [Thread-0-SendThread(] INFO ClientCnxn: Socket connection established to _, initiating session
      2016-11-17T20:23:28.576Z [Thread-0-SendThread(] INFO ClientCnxn: Unable to reconnect to ZooKeeper service, session 0xc585ba1e7b6adc2 has expired, closing socket connection
      2016-11-17T20:23:28.576Z [Thread-0-EventThread] WARN ConnectionState: Session expired event received
      2016-11-17T20:23:28.673Z [Thread-0-EventThread] INFO ZooKeeper: Initiating client connection, connectString=_ sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@6ddf3f9d
      2016-11-17T20:23:28.691Z [Thread-0-SendThread(] INFO ClientCnxn: Opening socket connection to server _
      2016-11-17T20:23:28.693Z [Thread-0-SendThread(] INFO ClientCnxn: Socket connection established to _, initiating session
      2016-11-17T20:23:28.701Z [Thread-0-SendThread(] INFO ClientCnxn: Session establishment complete on server _, sessionid = 0x2585ba1e69ffeca, negotiated timeout = 30000
      2016-11-17T20:23:28.701Z [Thread-0-EventThread] INFO ConnectionStateManager: State change: RECONNECTED
      2016-11-17T20:23:28.715Z [Thread-0-EventThread] INFO ConnectionStateManager: State change: LOST
      2016-11-17T20:23:28.715Z [Thread-0-EventThread] INFO ClientCnxn: EventThread shut down

      Reproduction:

      I was only able to reproduce the behaviour by adding artificial Thread.sleep in ConnectionState#process before calling the parent watchers if the event is session expired.

      ConnectionState#process
      @Override
      public void process(WatchedEvent event)
      {
      	if ( LOG_EVENTS )
      	{
      		log.debug("ConnectState watcher: " + event);
      	}
      
      	if ( event.getType() == Watcher.Event.EventType.None )
      	{
      		boolean wasConnected = isConnected.get();
      		boolean newIsConnected = checkState(event.getState(), wasConnected);
      		if ( newIsConnected != wasConnected )
      		{
      			isConnected.set(newIsConnected);
      			connectionStartMs = System.currentTimeMillis();
      		}
      	}
      
      	if (event.getState() == KeeperState.Expired)
      	{
      		System.err.println("::> sleep in ConnectionState#process");
      		try {
      			Thread.sleep(1000);
      		} catch (InterruptedException e) {}
      	}
      
      	for ( Watcher parentWatcher : parentWatchers )
      	{
      		TimeTrace timeTrace = new TimeTrace("connection-state-parent-process", tracer.get());
      		parentWatcher.process(event);
      		timeTrace.commit();
      	}
      }
      

      Some ideas for fix:

      • Add the event handling and calling parent watchers into a synchronized block.
      • Change the order of handling watched event and calling parent watchers (I'm not aware of the behaviour implications of this).
      • Move only calling reset to the end of the method "process".

      Attachments

        Issue Links

          Activity

            People

              randgalt Jordan Zimmerman
              _alexander Zoltan Szekeres
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: