ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1061

Zookeeper stop fails if start called twice

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.3.2
    • Fix Version/s: 3.4.0
    • Component/s: scripts
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The zkServer.sh script doesn't check properly to see if a previously started
      server is still running. If you call start twice, the second invocation
      will over-write the PID file with a process that then fails due to port
      occupancy.

      This means that stop will subsequently fail.

      Here is a reference that describes how init scripts should normally work:

      http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

        Activity

        Hide
        Ted Dunning added a comment -

        Here is a patch that handles the double start and fixes up some exit values.

        Show
        Ted Dunning added a comment - Here is a patch that handles the double start and fixes up some exit values.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12478734/ZOOKEEPER-1061.patch
        against trunk revision 1099329.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/256//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/256//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/256//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478734/ZOOKEEPER-1061.patch against trunk revision 1099329. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/256//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/256//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/256//console This message is automatically generated.
        Hide
        Ted Dunning added a comment -

        No unit tests are reasonably for these script-only changes. Here is a manual
        test. Without the fix, we see this mal-behavior:

        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Starting zookeeper ... 
        STARTED
        tdunning@ted-desk:~/Apache/zookeeper$ sudo jps
        17610 QuorumPeerMain
        17646 Jps
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Starting zookeeper ... 
        STARTED
        tdunning@ted-desk:~/Apache/zookeeper$ sudo jps
        17685 Jps
        17610 QuorumPeerMain
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Stopping zookeeper ... 
        kill: 160: No such process
        
        STOPPED
        tdunning@ted-desk:~/Apache/zookeeper$ sudo jps
        17730 Jps
        17610 QuorumPeerMain
        

        With the fix, I get this.

        tdunning@ted-desk:~/Apache/zookeeper$ patch < ZOOKEEPER-1061.patch 
        patching file zkServer.sh
        
        # first start works
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Starting zookeeper ... STARTED
        
        # second start fails with good message
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Starting zookeeper ... already running as process 17928.
        
        # and this is persistent behavior
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Starting zookeeper ... already running as process 17928.
        
        # stop now works
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Stopping zookeeper ... STOPPED
        
        # repeated stop works correctly
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Stopping zookeeper ... error: could not find file /var/zookeeper/zookeeper_server.pid
        
        # and start works again
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Starting zookeeper ... STARTED
        
        # but can't be repeated
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Starting zookeeper ... already running as process 18155.
        
        # running without proper permissions gives a different error
        tdunning@ted-desk:~/Apache/zookeeper$ bin/zkServer.sh start
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Starting zookeeper ... bin/zkServer.sh: 169: cannot create /var/zookeeper/zookeeper_server.pid: Permission denied
        FAILED TO WRITE PID
        tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop
        JMX enabled by default
        Using config: /etc/zookeeper/zoo.cfg
        Stopping zookeeper ... STOPPED
        
        Show
        Ted Dunning added a comment - No unit tests are reasonably for these script-only changes. Here is a manual test. Without the fix, we see this mal-behavior: tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Starting zookeeper ... STARTED tdunning@ted-desk:~/Apache/zookeeper$ sudo jps 17610 QuorumPeerMain 17646 Jps tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Starting zookeeper ... STARTED tdunning@ted-desk:~/Apache/zookeeper$ sudo jps 17685 Jps 17610 QuorumPeerMain tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Stopping zookeeper ... kill: 160: No such process STOPPED tdunning@ted-desk:~/Apache/zookeeper$ sudo jps 17730 Jps 17610 QuorumPeerMain With the fix, I get this. tdunning@ted-desk:~/Apache/zookeeper$ patch < ZOOKEEPER-1061.patch patching file zkServer.sh # first start works tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Starting zookeeper ... STARTED # second start fails with good message tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Starting zookeeper ... already running as process 17928. # and this is persistent behavior tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Starting zookeeper ... already running as process 17928. # stop now works tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Stopping zookeeper ... STOPPED # repeated stop works correctly tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Stopping zookeeper ... error: could not find file / var /zookeeper/zookeeper_server.pid # and start works again tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Starting zookeeper ... STARTED # but can't be repeated tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh start JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Starting zookeeper ... already running as process 18155. # running without proper permissions gives a different error tdunning@ted-desk:~/Apache/zookeeper$ bin/zkServer.sh start JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Starting zookeeper ... bin/zkServer.sh: 169: cannot create / var /zookeeper/zookeeper_server.pid: Permission denied FAILED TO WRITE PID tdunning@ted-desk:~/Apache/zookeeper$ sudo bin/zkServer.sh stop JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Stopping zookeeper ... STOPPED
        Hide
        Mahadev konar added a comment -

        thats good to have!

        love this change though :

        -dataDir=/export/crawlspace/mahadev/zookeeper/server1/data
        +dataDir=/home/tdunning/tmp
        

        Other than that +1 for the change!

        Show
        Mahadev konar added a comment - thats good to have! love this change though : -dataDir=/export/crawlspace/mahadev/zookeeper/server1/data +dataDir=/home/tdunning/tmp Other than that +1 for the change!
        Hide
        Ted Dunning added a comment -

        Ouch. Our internal review caught that after I pushed out the patch.

        I was kind of hoping nobody else would notice.

        Show
        Ted Dunning added a comment - Ouch. Our internal review caught that after I pushed out the patch. I was kind of hoping nobody else would notice.
        Hide
        Ted Dunning added a comment -

        Any hope for a commit on this?

        Show
        Ted Dunning added a comment - Any hope for a commit on this?
        Hide
        Mahadev konar added a comment -

        ted, ill commit this tonight (latest over the weekend). Need to run some tests manually and will also fix the datadir conf change to point to just /tmp/zookeeper.

        Show
        Mahadev konar added a comment - ted, ill commit this tonight (latest over the weekend). Need to run some tests manually and will also fix the datadir conf change to point to just /tmp/zookeeper.
        Hide
        Mahadev konar added a comment -

        I just pushed this with minor changes to conf/zoo_sample.cfg.
        Thanks Ted.

        Show
        Mahadev konar added a comment - I just pushed this with minor changes to conf/zoo_sample.cfg. Thanks Ted.
        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #1185 (See https://builds.apache.org/hudson/job/ZooKeeper-trunk/1185/)
        ZOOKEEPER-1061. Zookeeper stop fails if start called twice. (Ted Dunning via mahadev)

        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #1185 (See https://builds.apache.org/hudson/job/ZooKeeper-trunk/1185/ ) ZOOKEEPER-1061 . Zookeeper stop fails if start called twice. (Ted Dunning via mahadev)

          People

          • Assignee:
            Ted Dunning
            Reporter:
            Ted Dunning
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development