Description
When calling solr stop on linux, this command is used
CHECK_PID=`ps auxww | awk '{print $2}' | grep -w $SOLR_PID | sort -r | tr -d ' '`
https://github.com/apache/solr/blob/122c88a0748769432ef62cc3fb94c2226dd67aa7/solr/bin/solr#L871
If Solr has stopped but remains as a zombie process then its process entry will remain in the table, so ps auxww will continue to show the PID even after kill -9. So that results in something like this, with 3 minutes wasted waiting for a dead process to exit.
[2021-07-21T09:15:12.365Z] Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 12622 to stop gracefully.
[2021-07-21T09:18:13.551Z] [|] Solr process 12622 is still running; jstacking it now.
[2021-07-21T09:18:21.806Z] 12622: Unable to open socket file /proc/12622/root/tmp/.java_pid12622: target process 12622 doesn't respond within 10500ms or HotSpot VM not loaded
[2021-07-21T09:18:21.806Z] Solr process 12622 is still running; forcefully killing it now.
[2021-07-21T09:18:21.806Z] Killed process 12622
[2021-07-21T09:18:31.678Z] ERROR: Failed to kill previous Solr Java process 12622 ... script fails.
But the output of ps auxww does identify Zombie processes under STAT:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 12622 1.4 0.0 0 0 pts/1 Z 10:42 0:26 [java] <defunct>
So the CHECK_PID could filter out Zombies.
Obviously the bigger issue is why the process has ended up as a Zombie (in this case it was because of https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/ and not specifying "--init" when running Solr inside a docker container) so maybe a message warning that the process is a zombie is worth having, so that the user has an opportunity to do something about it.
Note from mdrob
That seems like a reasonable check to add, the only caution I would advise
is that a lot of developers use macs for local testing so make sure that
whatever flags you invoke are generally cross platform compatible, or
hidden behind appropriate conditions.
Attachments
Issue Links
- relates to
-
SOLR-16191 Sanity-check assumptions about Linux `ps` command in `solr/bin/solr` script
- Closed
- links to