Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9371

Fix bin/solr script calculations - start/stop wait time and RMI_PORT

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.1
    • Fix Version/s: 6.3, master (7.0)
    • Component/s: scripts and tools
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None
    • Flags:
      Patch

      Description

      The bin/solr script doesn't wait long enough for Solr to stop before it sends the KILL signal to the process. The start could use a longer wait too.

      Also, the RMI_PORT is calculated by simply prefixing the port number with a "1" instead of adding 10000. If the solr port has five digits, then the rmi port will be invalid, because it will be greater than 65535.

      1. SOLR-9371.patch
        4 kB
        Erick Erickson
      2. SOLR-9371.patch
        4 kB
        Shawn Heisey
      3. SOLR-9371.patch
        4 kB
        Shawn Heisey

        Issue Links

          Activity

          Hide
          elyograg Shawn Heisey added a comment -

          Patch.

          Show
          elyograg Shawn Heisey added a comment - Patch.
          Hide
          elyograg Shawn Heisey added a comment -

          The wait time is configurable in solr.in.sh ... I should probably put a commented example in there.

          Show
          elyograg Shawn Heisey added a comment - The wait time is configurable in solr.in.sh ... I should probably put a commented example in there.
          Hide
          elyograg Shawn Heisey added a comment -

          Updated patch. Commented config section in solr.in.sh for configuring WAIT_TIME.

          Show
          elyograg Shawn Heisey added a comment - Updated patch. Commented config section in solr.in.sh for configuring WAIT_TIME.
          Hide
          erickerickson Erick Erickson added a comment -

          Oh please commit this!

          May I suggest the variable be renamed to SOLR_WAIT_TIME? Or maybe SOLR_STOP_WAIT? At any rate prefix it with SOLR I think...

          Second, whatever the name, it should also be added to the solr.in.cmd for Windows users...

          Show
          erickerickson Erick Erickson added a comment - Oh please commit this! May I suggest the variable be renamed to SOLR_WAIT_TIME? Or maybe SOLR_STOP_WAIT? At any rate prefix it with SOLR I think... Second, whatever the name, it should also be added to the solr.in.cmd for Windows users...
          Hide
          elyograg Shawn Heisey added a comment - - edited

          I looked into the Windows start script. It just uses a plain "wait" sort of timeout. The "wait briefly and check PID" loop might be adaptable to the "stop" action in Windows by somebody who really knows the batch syntax.

          Question for the peanut gallery: What's the earliest Windows version we will support? One command that I found for doing silent pauses won't work on XP, and probably doesn't work on 2003 either. Both of these versions are end of life ... so do we need to support them?

          Show
          elyograg Shawn Heisey added a comment - - edited I looked into the Windows start script. It just uses a plain "wait" sort of timeout. The "wait briefly and check PID" loop might be adaptable to the "stop" action in Windows by somebody who really knows the batch syntax. Question for the peanut gallery: What's the earliest Windows version we will support? One command that I found for doing silent pauses won't work on XP, and probably doesn't work on 2003 either. Both of these versions are end of life ... so do we need to support them?
          Hide
          elyograg Shawn Heisey added a comment -

          Yeah, that occurred to me... but the name I came up with at first (SOLR_START_STOP_WAIT_TIME) was so long that I changed it back. SOLR_WAIT_TIME could work.

          Windows is going to be a bit harder, because I'm not as familiar with batch mehods as I am shell script. Any ideas are welcome.

          Show
          elyograg Shawn Heisey added a comment - Yeah, that occurred to me... but the name I came up with at first (SOLR_START_STOP_WAIT_TIME) was so long that I changed it back. SOLR_WAIT_TIME could work. Windows is going to be a bit harder, because I'm not as familiar with batch mehods as I am shell script. Any ideas are welcome.
          Hide
          varunthacker Varun Thacker added a comment -

          +1 for the change.

          I've once seen a case with a client running solr on hdfs , where Solr didn't shut down in 5 seconds and the start script killed the process. Restart won't work because of the write.lock issue on hdfs

          Show
          varunthacker Varun Thacker added a comment - +1 for the change. I've once seen a case with a client running solr on hdfs , where Solr didn't shut down in 5 seconds and the start script killed the process. Restart won't work because of the write.lock issue on hdfs
          Hide
          erickerickson Erick Erickson added a comment - - edited

          WDYT about just committing this and raising a separate JIRA for Windows? This has bitten us in the field repeatedly because it can force leader election, recovery and the like.

          I'd have a different opinion if it was new functionality or the like...

          BTW, how does this relate to SOLR-8065 if at all?

          Show
          erickerickson Erick Erickson added a comment - - edited WDYT about just committing this and raising a separate JIRA for Windows? This has bitten us in the field repeatedly because it can force leader election, recovery and the like. I'd have a different opinion if it was new functionality or the like... BTW, how does this relate to SOLR-8065 if at all?
          Hide
          elyograg Shawn Heisey added a comment - - edited

          I didn't notice SOLR-8065. If I had, I would have put the patch there.

          Mark Miller makes a very good point in his last comment on that issue. Perhaps the default timeout should be something really long. Initial bikeshed: maybe 3 minutes, same as Collections API? I did increase it in my patch, to 60 seconds.

          A non-windows system will exit earlier than the timeout if shutdown happens faster, and with my proposed patch, the user has the option of changing the timeout to make the hard kill happen faster if they want.

          Opening a separate issue for Windows seems prudent. It needs some work before we can increase the default timeout.

          Show
          elyograg Shawn Heisey added a comment - - edited I didn't notice SOLR-8065 . If I had, I would have put the patch there. Mark Miller makes a very good point in his last comment on that issue. Perhaps the default timeout should be something really long. Initial bikeshed: maybe 3 minutes, same as Collections API? I did increase it in my patch, to 60 seconds. A non-windows system will exit earlier than the timeout if shutdown happens faster, and with my proposed patch, the user has the option of changing the timeout to make the hard kill happen faster if they want. Opening a separate issue for Windows seems prudent. It needs some work before we can increase the default timeout.
          Hide
          erickerickson Erick Erickson added a comment -

          I just happened to run across 8065 during a search and said to myself "Hey, I remember something more recent about this." So I thought I'd add it in solely in the interests of closing JIRAs when possible....

          +1 to making it 3 minutes. Since (if I'm reading it correctly), the loop exits much more quickly if the process actually stops, there's no real problem there IMO. Especially for those of us who start/stop Solr about a zillion times a day when testing stuff. Plus, the pain of leader election/full sync/whatever can be high enough that waiting a bit more is a small price to pay for real-live production systems.

          Show
          erickerickson Erick Erickson added a comment - I just happened to run across 8065 during a search and said to myself "Hey, I remember something more recent about this." So I thought I'd add it in solely in the interests of closing JIRAs when possible.... +1 to making it 3 minutes. Since (if I'm reading it correctly), the loop exits much more quickly if the process actually stops, there's no real problem there IMO. Especially for those of us who start/stop Solr about a zillion times a day when testing stuff. Plus, the pain of leader election/full sync/whatever can be high enough that waiting a bit more is a small price to pay for real-live production systems.
          Hide
          janhoy Jan Høydahl added a comment -

          Perhaps you can use the new AssertTool from the windows script?

          call :run_assert -e -S http://localhost:8983/solr/
          IF errorlevel 1 ....
          
          Show
          janhoy Jan Høydahl added a comment - Perhaps you can use the new AssertTool from the windows script? call :run_assert -e -S http: //localhost:8983/solr/ IF errorlevel 1 ....
          Hide
          erickerickson Erick Erickson added a comment -

          What do people think about committing this sometime soon and maybe open a separate issue for Windows? I really think this is a bad trap and I'd really like to get in place before anyone thinks about cutting the next version of Solr. I also wonder if we should put it on 6_2 as well in case there's another release.

          I've got a couple of other things on my plate, but may try to get to the *nix version next week.

          Show
          erickerickson Erick Erickson added a comment - What do people think about committing this sometime soon and maybe open a separate issue for Windows? I really think this is a bad trap and I'd really like to get in place before anyone thinks about cutting the next version of Solr. I also wonder if we should put it on 6_2 as well in case there's another release. I've got a couple of other things on my plate, but may try to get to the *nix version next week.
          Hide
          emaijala Ere Maijala added a comment -

          Please do both. I've been hoping to get this fixed for over a year (see the linked issue SOLR-8065).

          Show
          emaijala Ere Maijala added a comment - Please do both. I've been hoping to get this fixed for over a year (see the linked issue SOLR-8065 ).
          Hide
          erickerickson Erick Erickson added a comment -

          Can you help with the Windows scripting? If so, please attach a patch.

          Show
          erickerickson Erick Erickson added a comment - Can you help with the Windows scripting? If so, please attach a patch.
          Hide
          emaijala Ere Maijala added a comment -

          Oops, sorry for being so vague. With "both" I meant "committing this" and "put it on 6_2 as well". I can't help with Windows, and I think that's why SOLR-8065 got stalled, but if you agree on doing this separately for Windows, that would be great. It's not easy to maintain your own version of the solr script when it's being enhanced all the time, and this issue has existed way too long already without anyone stepping up to do something about the Windows version.

          Show
          emaijala Ere Maijala added a comment - Oops, sorry for being so vague. With "both" I meant "committing this" and "put it on 6_2 as well". I can't help with Windows, and I think that's why SOLR-8065 got stalled, but if you agree on doing this separately for Windows, that would be great. It's not easy to maintain your own version of the solr script when it's being enhanced all the time, and this issue has existed way too long already without anyone stepping up to do something about the Windows version.
          Hide
          erickerickson Erick Erickson added a comment -

          OK, I'm going to commit this over the weekend and raise a different JIRA for Windows unless
          1> some kind person makes it work with windows
          or
          2> there are howls of protest.

          Progress, not perfection and all that.

          Jan Høydahl Not sure I want to go into the assert tool this close to a release since I'm not very familiar with it. Perhaps separate JIRA?

          Show
          erickerickson Erick Erickson added a comment - OK, I'm going to commit this over the weekend and raise a different JIRA for Windows unless 1> some kind person makes it work with windows or 2> there are howls of protest. Progress, not perfection and all that. Jan Høydahl Not sure I want to go into the assert tool this close to a release since I'm not very familiar with it. Perhaps separate JIRA?
          Hide
          erickerickson Erick Erickson added a comment -

          WDYT about naming this param scarily? I.e. SOLR_KILL_WAIT? I think that's more descriptive...

          Show
          erickerickson Erick Erickson added a comment - WDYT about naming this param scarily? I.e. SOLR_KILL_WAIT? I think that's more descriptive...
          Hide
          janhoy Jan Høydahl added a comment -

          Hi, you can do something like this

          bin/solr assert --not-started http://localhost:8983/solr/ --timeout 30000 -e
          

          But it will only check if Solr is listening on the port, it will not check if the process or PID file is gone, which I guess is what we need. I started on a UtilsTool that will check PIDs and optionally kill, but that is not committed yet. I'm fine with opening another JIRA for Windows and tackle it in next release.

          Show
          janhoy Jan Høydahl added a comment - Hi, you can do something like this bin/solr assert --not-started http: //localhost:8983/solr/ --timeout 30000 -e But it will only check if Solr is listening on the port, it will not check if the process or PID file is gone, which I guess is what we need. I started on a UtilsTool that will check PIDs and optionally kill, but that is not committed yet. I'm fine with opening another JIRA for Windows and tackle it in next release.
          Hide
          erickerickson Erick Erickson added a comment -

          Trivial changes
          1> renamed variable to SOLR_STOP_WAIT
          2> moved CHANGES to 6.3

          Show
          erickerickson Erick Erickson added a comment - Trivial changes 1> renamed variable to SOLR_STOP_WAIT 2> moved CHANGES to 6.3
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1344d895f96644a4d541acd5a9fbe9fe4d1969a5 in lucene-solr's branch refs/heads/master from Erick Erickson
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1344d89 ]

          SOLR-9371: Fix bin/solr script calculations - start/stop wait time and RMI_PORT

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1344d895f96644a4d541acd5a9fbe9fe4d1969a5 in lucene-solr's branch refs/heads/master from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1344d89 ] SOLR-9371 : Fix bin/solr script calculations - start/stop wait time and RMI_PORT
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 23591ff9b8850baae8edce571590f1d091d2be86 in lucene-solr's branch refs/heads/branch_6x from Erick Erickson
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=23591ff ]

          SOLR-9371: Fix bin/solr script calculations - start/stop wait time and RMI_PORT
          (cherry picked from commit 1344d89)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 23591ff9b8850baae8edce571590f1d091d2be86 in lucene-solr's branch refs/heads/branch_6x from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=23591ff ] SOLR-9371 : Fix bin/solr script calculations - start/stop wait time and RMI_PORT (cherry picked from commit 1344d89)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit d4797a16765ce6f451a149688c8e134864aaf90d in lucene-solr's branch refs/heads/branch_6_3 from Erick Erickson
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d4797a1 ]

          SOLR-9371: Fix bin/solr script calculations - start/stop wait time and RMI_PORT
          (cherry picked from commit 1344d89)
          (cherry picked from commit 23591ff)

          Show
          jira-bot ASF subversion and git services added a comment - Commit d4797a16765ce6f451a149688c8e134864aaf90d in lucene-solr's branch refs/heads/branch_6_3 from Erick Erickson [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d4797a1 ] SOLR-9371 : Fix bin/solr script calculations - start/stop wait time and RMI_PORT (cherry picked from commit 1344d89) (cherry picked from commit 23591ff)
          Hide
          erickerickson Erick Erickson added a comment -

          Thanks Shawn!

          Show
          erickerickson Erick Erickson added a comment - Thanks Shawn!
          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Closing after 6.3.0 release.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Closing after 6.3.0 release.

            People

            • Assignee:
              elyograg Shawn Heisey
              Reporter:
              elyograg Shawn Heisey
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development