Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1109

Zookeeper service is down when SyncRequestProcessor meets any exception.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 3.3.0, 3.3.1, 3.3.2, 3.3.3
    • 3.4.0
    • quorum
    • None
    • Reviewed
    • quorum, leader, disk full, shutdown

    Description

      Problem Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.

      Scenario
      If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.

      Root Cause
      this.join() is invoked in the same thread where System.exit(11) has been triggered.

      When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

      Attachments

        1. ZOOKEEPER-1109.1.patch
          1 kB
          Laxman
        2. ZOOKEEPER-1109.patch
          1 kB
          Laxman

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            lakshman Laxman
            lakshman Laxman
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 72h
                72h
                Remaining:
                Remaining Estimate - 72h
                72h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Issue deployment