Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3242

Asynchrony in ZK-close can lead to ZKRMStateStore watcher receiving events for old client

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Old ZK client session watcher event messed up new ZK client session due to ZooKeeper asynchronously closing client session.
      The watcher event from old ZK client session can still be sent to ZKRMStateStore after the old ZK client session is closed.
      This will cause seriously problem:ZKRMStateStore out of sync with ZooKeeper session.
      We only have one ZKRMStateStore but we can have multiple ZK client sessions.
      Currently ZKRMStateStore#processWatchEvent doesn't check whether this watcher event is from current session. So the watcher event from old ZK client session which just is closed will still be processed.
      For example, If a Disconnected event received from old session after new session is connected, the zkClient will be set to null

              case Disconnected:
                LOG.info("ZKRMStateStore Session disconnected");
                oldZkClient = zkClient;
                zkClient = null;
                break;
      

      Then ZKRMStateStore won't receive SyncConnected event from new session because new session is already in SyncConnected state and it won't send SyncConnected event until it is disconnected and connected again.
      Then we will see all the ZKRMStateStore operations fail with IOException "Wait for ZKClient creation timed out" until RM shutdown.

      The following code from zookeeper(ClientCnxn#EventThread) show even after receive eventOfDeath, EventThread will still process all the events until waitingEvents queue is empty.

                    while (true) {
                       Object event = waitingEvents.take();
                       if (event == eventOfDeath) {
                          wasKilled = true;
                       } else {
                          processEvent(event);
                       }
                       if (wasKilled)
                          synchronized (waitingEvents) {
                             if (waitingEvents.isEmpty()) {
                                isRunning = false;
                                break;
                             }
                          }
                    }
      
            private void processEvent(Object event) {
                try {
                    if (event instanceof WatcherSetEventPair) {
                        // each watcher will process the event
                        WatcherSetEventPair pair = (WatcherSetEventPair) event;
                        for (Watcher watcher : pair.watchers) {
                            try {
                                watcher.process(pair.event);
                            } catch (Throwable t) {
                                LOG.error("Error while calling watcher ", t);
                            }
                        }
                    } else {
      
          public void disconnect() {
              if (LOG.isDebugEnabled()) {
                  LOG.debug("Disconnecting client for session: 0x"
                            + Long.toHexString(getSessionId()));
              }
      
              sendThread.close();
              eventThread.queueEventOfDeath();
          }
      
          public void close() throws IOException {
              if (LOG.isDebugEnabled()) {
                  LOG.debug("Closing client for session: 0x"
                            + Long.toHexString(getSessionId()));
              }
      
              try {
                  RequestHeader h = new RequestHeader();
                  h.setType(ZooDefs.OpCode.closeSession);
      
                  submitRequest(h, null, null, null);
              } catch (InterruptedException e) {
                  // ignore, close the send/event threads
              } finally {
                  disconnect();
              }
          }
      

      Attachments

        1. YARN-3242.004.patch
          10 kB
          Zhihai Xu
        2. YARN-3242.003.patch
          10 kB
          Zhihai Xu
        3. YARN-3242.002.patch
          9 kB
          Zhihai Xu
        4. YARN-3242.001.patch
          7 kB
          Zhihai Xu
        5. YARN-3242.000.patch
          5 kB
          Zhihai Xu

        Activity

          People

            zxu Zhihai Xu
            zxu Zhihai Xu
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: