Uploaded image for project: 'Commons Daemon'
  1. Commons Daemon
  2. DAEMON-377

Race in PID file handing in jsvc resulting in Tomcat running without a pidfile

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • None
    • None

    Description

      This issue is reproducible in FreeBSD under moderate load during a restart of Tomcat 8 running via jsvc.

      The old jsvc controller process exits after the new process is started, deleting the new pid file with it. As a result, the jsvc starter process fails with a timeout since it is waiting on the pid file to be created, which never happens. The Tomcat process itself is started without a pid file.

      Example: (35659/35660 is the old jsvc process, 56362/56363 is the new jsvc process):

      2017-11-08 09:36:59 35660 jsvc debug: Daemon destroyed successfully
      2017-11-08 09:36:59 35660 jsvc debug: Calling System.exit(0)
      2017-11-08 09:36:59 56362 jsvc debug: Switching umask back to 022 from 077
      (((/var/run/tomcat8.pid is written by 56362 here)))
      2017-11-08 09:36:59 56363 jsvc debug: Using specific JVM in /usr/local/openjdk8/jre/lib/amd64/server/libjvm.so
      2017-11-08 09:36:59 56363 jsvc debug: Attemtping to load library /usr/local/openjdk8/jre/lib/amd64/server/libjvm.so
      (((/var/run/tomcat8.pid is deleted by 35659 here)))
      2017-11-08 09:36:59 35659 jsvc debug: Service shut down
      2017-11-08 09:36:59 56363 jsvc debug: JVM library /usr/local/openjdk8/jre/lib/amd64/server/libjvm.so loaded
      2017-11-08 09:36:59 56363 jsvc debug: JVM library entry point found (0x019DE640)
      

      Restart script eventually times out:

      >/usr/local/etc/rc.d/tomcat8 restart
      Stopping tomcat8.
      Waiting for PIDS: 35660.
      Starting tomcat8.
      /usr/local/etc/rc.d/tomcat8: WARNING: failed to start tomcat8
      

      No PID file:

      >ls -l /var/run/tomcat8.pid
      ls: /var/run/tomcat8.pid: No such file or directory
      

      Yet Tomcat is running:

      >ps ax|grep java|grep -v grep
      56362  -  Is      0:00.00 /usr/local/bin/jsvc -java-home /usr/local/openjdk8 -server -user www -pidfile /var/run/tomcat8.pid -wait 300 -outfile /u
      56363  -  I       0:57.25 /usr/local/bin/jsvc -java-home /usr/local/openjdk8 -server -user www -pidfile /var/run/tomcat8.pid -wait 300 -outfile /u
      

      The issue is that the pidfile contains the PID of the child, but is being deleted by the parent process (the controller), in the run_controller function which looks like:

      static int run_controller(arg_data *args, home_data *data, uid_t uid,
                                gid_t gid)
        . . .
        waitpid(pid, &status, 0);
        unlink(args->pidf);
      

      If the controller process is paged out (which happens often because it is dormant while inside waitpid), then considerable amount of time can pass between the time the child terminates and the call to unlink(args->pidf).

      The issue can be reproduced reliably by adding sleep(1); before unlink(args->pidf).

      Attachments

        1. daemon-377.patch
          3 kB
          Rustam Abdullaev

        Activity

          People

            Unassigned Unassigned
            rustamabd Rustam Abdullaev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: