Uploaded image for project: 'Commons Daemon'
  1. Commons Daemon
  2. DAEMON-183

Abnormal shutdown leaves the pidfile, which prevents subsequent startup

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Duplicate
    • 1.0.3
    • None
    • Procrun
    • None

    Description

      This is really a trivial issue, so you may want to just close as a WONTFIX but it does represent an inconsistency that I don't feel I can release into production so I'm documenting it here.

      When using the pidfile with procrun, if the pidfile isn't deleted then the next startup fails indicating that a Pid file exists. Due to incorrectly configuring the service (my stopmode was not set, so my main thread never returned, causing it to timeout), I often always had the pidfile existing after the service came down. This in and of itself seems like it may be an issue.

      None the less on a subsequent startup, it failed indicating that a pidfile existed-- but then deleted the existing pidfile. So a second attempt to start would successfully work. It just felt a little strange that it would fail the first time, and then work the second time. I don't really know if its wrong, but I know that my customers would feel this is fragile/weird. Thus, I am just not using the pidfile.

      So a few thoughts:

      1) should the pidfile check go further and query for a running process with the expected image (servicename.exe) and process id? and if it doesn't exist, assume this is an orphaned pidfile and delete it then continue startup
      2) obviously if scm or an external user kills the process then you can't delete the file-- but the timeout that I experienced I think came from SCM not from the timeout in serviceStop (e.g. I don't think I had a "Worker was killed" message). So are you aware of a problem with the timeout logic where the SCM will force the process down instead of waiting for procrun to timeout?
      3) today if the process aborts startup because the pidfile already exists, the gPidfileName global has already been set, and thus it deletes the pidfile (i.e. why the second attempt to start succeeds). What happens if this pid file represents a real already running process? Is the other process locking it-- and the delete would fail? Or would it successfully delete the pidfile now allowing multiple concurrent instances to run?

      Just a few minor things. If you feel any of these things are worth implementing/changing, I would be happy to work on it and submit a patch. If not, no worries.

      Attachments

        Activity

          People

            Unassigned Unassigned
            steve.ash Steve Ash
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: