Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-15558

Solr stop doesn't handle zombie processes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Trivial
    • Resolution: Fixed
    • None
    • 9.0, 8.11.2
    • None
    • None

    Description

      When calling solr stop on linux, this command is used
      CHECK_PID=`ps auxww | awk '{print $2}' | grep -w $SOLR_PID | sort -r | tr -d ' '`
      https://github.com/apache/solr/blob/122c88a0748769432ef62cc3fb94c2226dd67aa7/solr/bin/solr#L871
       
      If Solr has stopped but remains as a zombie process then its process entry will remain in the table, so ps auxww will continue to show the PID even after kill -9. So that results in something like this, with 3 minutes wasted waiting for a dead process to exit.
       
      [2021-07-21T09:15:12.365Z] Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 12622 to stop gracefully.
      [2021-07-21T09:18:13.551Z]  [|] Solr process 12622 is still running; jstacking it now.
      [2021-07-21T09:18:21.806Z] 12622: Unable to open socket file /proc/12622/root/tmp/.java_pid12622: target process 12622 doesn't respond within 10500ms or HotSpot VM not loaded
      [2021-07-21T09:18:21.806Z] Solr process 12622 is still running; forcefully killing it now.
      [2021-07-21T09:18:21.806Z] Killed process 12622
      [2021-07-21T09:18:31.678Z] ERROR: Failed to kill previous Solr Java process 12622 ... script fails.
       
      But the output of ps auxww does identify Zombie processes under STAT:
      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      root          12622  1.4  0.0              0     0       pts/1     Z    10:42   0:26 [java] <defunct>  
       
      So the CHECK_PID could filter out Zombies.
      Obviously the bigger issue is why the process has ended up as a Zombie (in this case it was because of https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ and not specifying "--init" when running Solr inside a docker container) so maybe a message warning that the process is a zombie is worth having, so that the user has an opportunity to do something about it.
       
      Note from mdrob

      That seems like a reasonable check to add, the only caution I would advise
      is that a lot of developers use macs for local testing so make sure that
      whatever flags you invoke are generally cross platform compatible, or
      hidden behind appropriate conditions.

      Attachments

        Issue Links

          Activity

            People

              mdrob Mike Drob
              colvinco Colvin Cowie
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m