Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9360

Solr script not properly checking SOLR_PID

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.1
    • Fix Version/s: 6.4, 7.0
    • Component/s: scripts and tools
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None
    • Flags:
      Patch

      Description

      In the solr script we see in 3-4 areas this check :

      SOLR_PID=`ps auxww | grep start\.jar | grep -w $SOLR_PORT | grep -v grep | awk '

      {print $2}

      ' | sort -r`

      This can potentially prevent a solr instance to start in the case another process by any chance contains the port int the command itself ( not necessarily actually using the port) .

      e.g.
      java -server -Djetty.port=10504 -DSTOP.PORT=9504 -DSTOP.KEY=solrrocks -DMASTER_CORE_URL=external-server:10500/solr -jar start.jar --module=http

      A solr is running on 10504.
      A new Solr will not be able to start on 10500 ( but actually the port is free).
      This should be replaced by a real check if the port is used ( like netstat or similar) .

      1. SOLR_9360.patch
        3 kB
        Alessandro Benedetti
      2. SOLR-9360.patch
        4 kB
        Erick Erickson
      3. SOLR-9360.patch
        3 kB
        Erick Erickson

        Activity

        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        ps replaced with netstat

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - ps replaced with netstat
        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        The patch is tentative, the modification worked for me on Centos, not sure how netstat is a good solution, please people with more sys admin experience let me know.
        Cheers

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - The patch is tentative, the modification worked for me on Centos, not sure how netstat is a good solution, please people with more sys admin experience let me know. Cheers
        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        Erick Erickson can you take a look ?
        It is a sall change, but it was a production bug for us, quite annoying to be fair

        Cheers

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - Erick Erickson can you take a look ? It is a sall change, but it was a production bug for us, quite annoying to be fair Cheers
        Hide
        erickerickson Erick Erickson added a comment -

        Assigned to myself, but I'm not really that up on whether netstat is a good way to fix this or not so anyone else who wants to grab it feel free.....

        Or render an opinion, I'll be happy to commit.

        Show
        erickerickson Erick Erickson added a comment - Assigned to myself, but I'm not really that up on whether netstat is a good way to fix this or not so anyone else who wants to grab it feel free..... Or render an opinion, I'll be happy to commit.
        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        Thanks Erick, the only reason I tagged you is because I saw you working on some "script" related Jiras
        To be fair, I hope someone illuminates us as I have limited experience in sys admin tasks, but at least it worked

        Cheers

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - Thanks Erick, the only reason I tagged you is because I saw you working on some "script" related Jiras To be fair, I hope someone illuminates us as I have limited experience in sys admin tasks, but at least it worked Cheers
        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        Thanks Erick, the only reason I tagged you is because I saw you working on some "script" related Jiras
        To be fair, I hope someone illuminates us as I have limited experience in sys admin tasks, but at least it worked

        Cheers

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - Thanks Erick, the only reason I tagged you is because I saw you working on some "script" related Jiras To be fair, I hope someone illuminates us as I have limited experience in sys admin tasks, but at least it worked Cheers
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment -

        According to the description – the current script prevents starting a solr on port 10500 if another solr on 10504 is running. Why would that happen? are we not matching whole words?

        I think a better fix would be to match the full -Djetty.port=$SOLR_PORT string.

        Many OS do not have netstat by default e.g. centos minimal. Docker-solr which uses a stripped down version of debian also does not have netstat by default. Also, the -p switch used in the patch is not supported on MacOS. (thanks to Martijn Koster for these tips).

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - According to the description – the current script prevents starting a solr on port 10500 if another solr on 10504 is running. Why would that happen? are we not matching whole words? I think a better fix would be to match the full -Djetty.port=$SOLR_PORT string. Many OS do not have netstat by default e.g. centos minimal. Docker-solr which uses a stripped down version of debian also does not have netstat by default. Also, the -p switch used in the patch is not supported on MacOS. (thanks to Martijn Koster for these tips).
        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        Hi Shalin,
        thank you very much for the feedback!
        Currently is checking only the port number :

        SOLR_PID=`ps auxww | grep start\.jar | grep -w $SOLR_PORT | grep -v grep | awk '

        {print $2}

        ' | sort -r`

        This can potentially be prone to errors.
        To me simply using the grep , even with the exact phrase ( "-Djetty.port=$SOLR_PORT"), could drive to errors.
        Is there no standard way, supported also by minimal OS, to effectively check if the port is available or not ?

        Cheers

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - Hi Shalin, thank you very much for the feedback! Currently is checking only the port number : SOLR_PID=`ps auxww | grep start\.jar | grep -w $SOLR_PORT | grep -v grep | awk ' {print $2} ' | sort -r` This can potentially be prone to errors. To me simply using the grep , even with the exact phrase ( "-Djetty.port=$SOLR_PORT"), could drive to errors. Is there no standard way, supported also by minimal OS, to effectively check if the port is available or not ? Cheers
        Hide
        shalinmangar Shalin Shekhar Mangar added a comment - - edited

        Is there no standard way, supported also by minimal OS, to effectively check if the port is available or not ?

        Typically, startup scripts write the pid of the started process to a file and use that pid for figuring out if the process is still running and for killing the process. Solr doesn't do that which is why we have this problem.

        To me simply using the grep , even with the exact phrase ( "-Djetty.port=$SOLR_PORT"), could drive to errors.

        Short of writing/using pid files, I think this is the best available workaround?

        <edit> - I wasn't clear earlier but the intention is to match the whole word i.e. grepping for -Djetty.port=1050 should not match -Djetty.port=10500

        Show
        shalinmangar Shalin Shekhar Mangar added a comment - - edited Is there no standard way, supported also by minimal OS, to effectively check if the port is available or not ? Typically, startup scripts write the pid of the started process to a file and use that pid for figuring out if the process is still running and for killing the process. Solr doesn't do that which is why we have this problem. To me simply using the grep , even with the exact phrase ( "-Djetty.port=$SOLR_PORT"), could drive to errors. Short of writing/using pid files, I think this is the best available workaround? <edit> - I wasn't clear earlier but the intention is to match the whole word i.e. grepping for -Djetty.port=1050 should not match -Djetty.port=10500
        Hide
        erickerickson Erick Erickson added a comment -

        OK, the grep -w option does match whole words only, so this works.

        -D is a flag for grep (how universal I'm not sure) and at least one other place greps (incorrectly) just for jetty.port rather than jetty\.port so I'll just use the jetty\.port=$WOLR_PORT version

        There's a couple of greps for "solr.solr.home" as well that I'll fix on the way by.

        I took a brief look at the windows script and it uses another mechanism, so I'm not going to touch that one especially as I have no way to test it.

        Show
        erickerickson Erick Erickson added a comment - OK, the grep -w option does match whole words only, so this works. -D is a flag for grep (how universal I'm not sure) and at least one other place greps (incorrectly) just for jetty.port rather than jetty\.port so I'll just use the jetty\.port=$WOLR_PORT version There's a couple of greps for "solr.solr.home" as well that I'll fix on the way by. I took a brief look at the windows script and it uses another mechanism, so I'm not going to touch that one especially as I have no way to test it.
        Hide
        erickerickson Erick Erickson added a comment - - edited

        Oh, and the script does use a pid file(s), this is a bit of extra logic using these patterns.

        Show
        erickerickson Erick Erickson added a comment - - edited Oh, and the script does use a pid file(s), this is a bit of extra logic using these patterns.
        Hide
        erickerickson Erick Erickson added a comment -

        I think this does it, I'll check it in probably tomorrow unless people object.

        Show
        erickerickson Erick Erickson added a comment - I think this does it, I'll check it in probably tomorrow unless people object.
        Hide
        alessandro.benedetti Alessandro Benedetti added a comment -

        Erick Erickson I think it is fair enough

        +1

        Show
        alessandro.benedetti Alessandro Benedetti added a comment - Erick Erickson I think it is fair enough +1
        Hide
        erickerickson Erick Erickson added a comment -

        Final patch, same as yesterday's but with CHANGES.txt updated.

        Show
        erickerickson Erick Erickson added a comment - Final patch, same as yesterday's but with CHANGES.txt updated.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit b2bf87dee7ea1b98eed62ccc3a921d387db39a79 in lucene-solr's branch refs/heads/master from Erick Erickson
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b2bf87d ]

        SOLR-9360: Solr script not properly checking SOLR_PID

        Show
        jira-bot ASF subversion and git services added a comment - Commit b2bf87dee7ea1b98eed62ccc3a921d387db39a79 in lucene-solr's branch refs/heads/master from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b2bf87d ] SOLR-9360 : Solr script not properly checking SOLR_PID
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 758aacc7bb6fac04bbc49262fa710f1e496e1f59 in lucene-solr's branch refs/heads/branch_6x from Erick Erickson
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=758aacc ]

        SOLR-9360: Solr script not properly checking SOLR_PID
        (cherry picked from commit b2bf87d)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 758aacc7bb6fac04bbc49262fa710f1e496e1f59 in lucene-solr's branch refs/heads/branch_6x from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=758aacc ] SOLR-9360 : Solr script not properly checking SOLR_PID (cherry picked from commit b2bf87d)
        Hide
        erickerickson Erick Erickson added a comment -

        Thanks Allessandro!

        Show
        erickerickson Erick Erickson added a comment - Thanks Allessandro!
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit b2bf87dee7ea1b98eed62ccc3a921d387db39a79 in lucene-solr's branch refs/heads/apiv2 from Erick Erickson
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b2bf87d ]

        SOLR-9360: Solr script not properly checking SOLR_PID

        Show
        jira-bot ASF subversion and git services added a comment - Commit b2bf87dee7ea1b98eed62ccc3a921d387db39a79 in lucene-solr's branch refs/heads/apiv2 from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b2bf87d ] SOLR-9360 : Solr script not properly checking SOLR_PID

          People

          • Assignee:
            erickerickson Erick Erickson
            Reporter:
            alessandro.benedetti Alessandro Benedetti
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development