Description
The bin/solr script uses this lsof invocation to check if the Solr port is being listened on:
running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN` if [ -z "$running" ]; then
code is at here.
There are a few issues with this.
1. False negatives when port is occupied by different user
When lsof runs as non-root, it only shows sockets for processes with your effective uid.
For example:
$ id -u && nc -l 7788 & [1] 26576 1000 #### works: nc ran as my user $ lsof -PniTCP:7788 -sTCP:LISTEN COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME nc 26580 mak 3u IPv4 2818104 0t0 TCP *:7788 (LISTEN) #### fails: ssh is running as root $ lsof -PniTCP:22 -sTCP:LISTEN #### works if we are root $ sudo lsof -PniTCP:22 -sTCP:LISTEN COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME sshd 2524 root 3u IPv4 18426 0t0 TCP *:22 (LISTEN) sshd 2524 root 4u IPv6 18428 0t0 TCP *:22 (LISTEN)
Solr runs as non-root.
So if some other process owned by a different user occupies that port, you will get a false negative (it will say Solr is not running even though it is)
I can't think of a good way to fix or work around that (short of not using lsof in the first place).
Perhaps an uncommon scenario we need not worry too much about.
2. lsof can complain about lack of /etc/password entries
If lsof runs without the current effective user having an entry in /etc/passwd,
it produces a warning on stderr:
$ docker run -d -u 0 solr:7.6.0 bash -c "chown -R 8888 /opt/; gosu 8888 solr-foreground" 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 $ docker exec -it -u 8888 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN lsof: no pwd entry for UID 8888 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME lsof: no pwd entry for UID 8888 java 9 8888 115u IPv4 2813503 0t0 TCP *:8983 (LISTEN) I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN>/dev/null lsof: no pwd entry for UID 8888 lsof: no pwd entry for UID 8888
You can avoid this by using the -t tag, which specifies that lsof should produce terse output with process identifiers only and no header:
I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN 9
This is a rare circumstance, but one I encountered and worked around.
3. On Alpine, lsof is implemented by busybox, but with incompatible arguments
On Alpine, busybox implements lsof, but does not support the arguments, so you get:
$ docker run -it alpine sh / # lsof -t -PniTCP:8983 -sTCP:LISTEN 1 /bin/busybox /dev/pts/0 1 /bin/busybox /dev/pts/0 1 /bin/busybox /dev/pts/0 1 /bin/busybox /dev/tty
so if you ran Solr, in the background, and it failed to start, this code would produce a false positive.
For example:
docker volume create mysol docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol" docker run -it -v mysol:/mysol -w /mysol -v $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh apk add procps bash tar xvzf /solr-7.6.0.tgz chown -R 8983:8983 .
then in a separate terminal:
$ docker exec -it -u 8983 serene_saha sh /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start whoami: unknown uid 8983 Waiting up to 180 seconds to see Solr running on port 8983 [|] Started Solr server on port 8983 (pid=101). Happy searching! /mysol $
and in another separate terminal:
$ docker exec -it thirsty_liskov bash bash-4.4$ cat server/logs/solr-8983-console.log Unrecognized option: --invalid Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.
so it is saying Solr is running, when it isn't.
Now, all this can be avoided by just installing the real lsof with apk add lsof which works properly. So should we detect and warn? Or even refuse to run rather than invoke a tool that does not implement the contract we expect?
4. Shellcheck dislikes backticks
Shellcheck says SC2006: Use $(..) instead of legacy `..`.
Now, shellcheck complains about 130 other issues too, so it's a drop in a bucket, but if we're changing things, might as well fix that.