Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13089

bin/solr's use of lsof has some issues

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 8.5
    • SolrCLI
    • None

    Description

      The bin/solr script uses this lsof invocation to check if the Solr port is being listened on:

      running=`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN`
      if [ -z "$running" ]; then
      

      code is at here.

      There are a few issues with this.

      1. False negatives when port is occupied by different user

      When lsof runs as non-root, it only shows sockets for processes with your effective uid.
      For example:

      $ id -u && nc -l 7788 &
      [1] 26576
      1000
      
      #### works: nc ran as my user
      $ lsof -PniTCP:7788 -sTCP:LISTEN
      COMMAND   PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
      nc      26580  mak    3u  IPv4 2818104      0t0  TCP *:7788 (LISTEN)
      
      #### fails: ssh is running as root
      $ lsof -PniTCP:22 -sTCP:LISTEN
      
      #### works if we are root
      $ sudo lsof -PniTCP:22 -sTCP:LISTEN
      COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
      sshd    2524 root    3u  IPv4  18426      0t0  TCP *:22 (LISTEN)
      sshd    2524 root    4u  IPv6  18428      0t0  TCP *:22 (LISTEN)
      

      Solr runs as non-root.
      So if some other process owned by a different user occupies that port, you will get a false negative (it will say Solr is not running even though it is)
      I can't think of a good way to fix or work around that (short of not using lsof in the first place).
      Perhaps an uncommon scenario we need not worry too much about.

      2. lsof can complain about lack of /etc/password entries

      If lsof runs without the current effective user having an entry in /etc/passwd,
      it produces a warning on stderr:

      $ docker run -d -u 0 solr:7.6.0  bash -c "chown -R 8888 /opt/; gosu 8888 solr-foreground"
      4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6
      
      $ docker exec -it -u 8888 4397c3f51d4a1cfca7e5815e5b047f75fb144265d4582745a584f0dba51480c6 bash
      I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN
      lsof: no pwd entry for UID 8888
      COMMAND PID     USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
      lsof: no pwd entry for UID 8888
      java      9     8888  115u  IPv4 2813503      0t0  TCP *:8983 (LISTEN)
      I have no name!@4397c3f51d4a:/opt/solr$ lsof -PniTCP:8983 -sTCP:LISTEN>/dev/null
      lsof: no pwd entry for UID 8888
      lsof: no pwd entry for UID 8888
      

      You can avoid this by using the -t tag, which specifies that lsof should produce terse output with process identifiers only and no header:

      I have no name!@4397c3f51d4a:/opt/solr$ lsof -t -PniTCP:8983 -sTCP:LISTEN
      9
      

      This is a rare circumstance, but one I encountered and worked around.

      3. On Alpine, lsof is implemented by busybox, but with incompatible arguments

      On Alpine, busybox implements lsof, but does not support the arguments, so you get:

      $ docker run -it alpine sh
      / # lsof -t -PniTCP:8983 -sTCP:LISTEN
      1	/bin/busybox	/dev/pts/0
      1	/bin/busybox	/dev/pts/0
      1	/bin/busybox	/dev/pts/0
      1	/bin/busybox	/dev/tty
      

      so if you ran Solr, in the background, and it failed to start, this code would produce a false positive.
      For example:

      docker volume create mysol
      docker run -v mysol:/mysol bash bash -c "chown 8983:8983 /mysol"
      docker run -it -v mysol:/mysol -w /mysol -v $HOME/Downloads/solr-7.6.0.tgz:/solr-7.6.0.tgz openjdk:8-alpine sh
      apk add procps bash
      tar xvzf /solr-7.6.0.tgz
      chown -R 8983:8983 .
      

      then in a separate terminal:

      $ docker exec -it -u 8983 serene_saha  sh
      /mysol $ SOLR_OPTS=--invalid ./solr-7.6.0/bin/solr start
      whoami: unknown uid 8983
      Waiting up to 180 seconds to see Solr running on port 8983 [|]  
      Started Solr server on port 8983 (pid=101). Happy searching!
      
      /mysol $ 
      

      and in another separate terminal:

      $ docker exec -it thirsty_liskov bash
      
      bash-4.4$ cat server/logs/solr-8983-console.log 
      Unrecognized option: --invalid
      Error: Could not create the Java Virtual Machine.
      Error: A fatal exception has occurred. Program will exit.
      

      so it is saying Solr is running, when it isn't.

      Now, all this can be avoided by just installing the real lsof with apk add lsof which works properly. So should we detect and warn? Or even refuse to run rather than invoke a tool that does not implement the contract we expect?

      4. Shellcheck dislikes backticks

      Shellcheck says SC2006: Use $(..) instead of legacy `..`.
      Now, shellcheck complains about 130 other issues too, so it's a drop in a bucket, but if we're changing things, might as well fix that.

      Attachments

        1. 0001-SOLR-13089-lsof-fixes.patch
          2 kB
          Martijn Koster
        2. SOLR-13089.patch
          1 kB
          Jan Høydahl

        Activity

          People

            janhoy Jan Høydahl
            makuk66 Martijn Koster
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: