Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-8145

bin/solr script oom_killer arg incorrect

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 5.2.1
    • Fix Version/s: 5.5.1, 6.0
    • Component/s: scripts and tools
    • Labels:
      None

      Description

      I noticed the oom_killer script wasn't working in our 5.2 deployment.

      In the bin/solr script, the OnOutOfMemoryError option is being passed as an arg to the jar rather than to the JVM. I moved it ahead of -jar and verified it shows up in the JVM args in the UI.

         # run Solr in the background
          nohup "$JAVA" "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -jar start.jar \
          "-XX:OnOutOfMemoryError=$SOLR_TIP/bin/oom_solr.sh $SOLR_PORT $SOLR_LOGS_DIR" "${SOLR_JETTY_CONFIG[@]}" \
      

      Also, I'm not sure what the SOLR_PORT and SOLR_LOGS_DIR args are doing--they don't appear to be positional arguments to the jar.

      Attaching a patch against 5.2.

      1. SOLR-8145.patch
        0.8 kB
        Jurian Broertjes
      2. SOLR-8145.patch
        0.8 kB
        Jurian Broertjes
      3. SOLR-8145.patch
        0.8 kB
        Nate Dire

        Issue Links

          Activity

          Hide
          jurian Jurian Broertjes added a comment -

          SOLR_PORT and SOLR_LOGS_DIR are arguments for the oom_solr.sh script and are required for proper OOM handling. I've updated your patch and verified that it's working now.

          Show
          jurian Jurian Broertjes added a comment - SOLR_PORT and SOLR_LOGS_DIR are arguments for the oom_solr.sh script and are required for proper OOM handling. I've updated your patch and verified that it's working now.
          Hide
          jurian Jurian Broertjes added a comment -

          Updated patch with proper svn diff instead of just diff

          Show
          jurian Jurian Broertjes added a comment - Updated patch with proper svn diff instead of just diff
          Hide
          jmlucjav jmlucjav added a comment - - edited

          I was going to submit a PR when I found this.

          Something like this, so trivial and so important for a robust production system should have been merged already, please make it into 6.0 at least.

          I verified it works on Linux/Windows with jdk8.

          Also, if you are worried about the submitter's comment about SOLR_PORT and SOLR_LOGS_DIR, the patch is handling them correctly, they are just parameters for the oom killer script, so the patch is correct.

          Show
          jmlucjav jmlucjav added a comment - - edited I was going to submit a PR when I found this. Something like this, so trivial and so important for a robust production system should have been merged already, please make it into 6.0 at least. I verified it works on Linux/Windows with jdk8. Also, if you are worried about the submitter's comment about SOLR_PORT and SOLR_LOGS_DIR, the patch is handling them correctly, they are just parameters for the oom killer script, so the patch is correct.
          Hide
          elyograg Shawn Heisey added a comment -

          Just noticed this because of the comment today. Don't know how I missed it before.

          This could explain SOLR-8539 and other similar problems.

          Show
          elyograg Shawn Heisey added a comment - Just noticed this because of the comment today. Don't know how I missed it before. This could explain SOLR-8539 and other similar problems.
          Hide
          dragonsinth Scott Blum added a comment -

          I've always just given this special handling in my own installation. My bin/solr script is edited to use a specific env var:

          diff --git a/solr/bin/solr b/solr/bin/solr
          index 85cd550..0e383b5 100755
          --- a/solr/bin/solr
          +++ b/solr/bin/solr
          @@ -1315,7 +1315,8 @@ function launch_solr() {
           
             if [ "$run_in_foreground" == "true" ]; then
               echo -e "\nStarting Solr$IN_CLOUD_MODE on port $SOLR_PORT from $SOLR_SERVER_DIR\n"
          -    exec "$JAVA" "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -jar start.jar "${SOLR_JETTY_CONFIG[@]}"
          +    # scottb: OOMKILLER needs super special handling
          +    exec "$JAVA" "${SOLR_START_OPTS[@]}" "${OOMKILLER:=-Dno.oom.killer}" $SOLR_ADDL_ARGS -jar start.jar "${SOLR_JETTY_CONFIG[@]}"
             else
               # run Solr in the background
               nohup "$JAVA" "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -jar start.jar \
          

          Then I just setup OOMKILLER in the calling script to be something like

          '-XX:OnOutOfMemoryError=exec ' + os.path.abspath('killjava.sh')
          

          At that point, you can get by with a super simple killjava.sh:

          #!/bin/bash
          kill -9 $PPID
          

          The current scheme seems super complicated.

          Show
          dragonsinth Scott Blum added a comment - I've always just given this special handling in my own installation. My bin/solr script is edited to use a specific env var: diff --git a/solr/bin/solr b/solr/bin/solr index 85cd550..0e383b5 100755 --- a/solr/bin/solr +++ b/solr/bin/solr @@ -1315,7 +1315,8 @@ function launch_solr() { if [ "$run_in_foreground" == " true " ]; then echo -e "\nStarting Solr$IN_CLOUD_MODE on port $SOLR_PORT from $SOLR_SERVER_DIR\n" - exec "$JAVA" "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -jar start.jar "${SOLR_JETTY_CONFIG[@]}" + # scottb: OOMKILLER needs super special handling + exec "$JAVA" "${SOLR_START_OPTS[@]}" "${OOMKILLER:=-Dno.oom.killer}" $SOLR_ADDL_ARGS -jar start.jar "${SOLR_JETTY_CONFIG[@]}" else # run Solr in the background nohup "$JAVA" "${SOLR_START_OPTS[@]}" $SOLR_ADDL_ARGS -jar start.jar \ Then I just setup OOMKILLER in the calling script to be something like '-XX:OnOutOfMemoryError=exec ' + os.path.abspath('killjava.sh') At that point, you can get by with a super simple killjava.sh: #!/bin/bash kill -9 $PPID The current scheme seems super complicated.
          Hide
          thelabdude Timothy Potter added a comment -

          thanks for the patch, I'll get this committed soon

          Show
          thelabdude Timothy Potter added a comment - thanks for the patch, I'll get this committed soon
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 80801a2738c10ace30eca549fe5daadb88989c32 in lucene-solr's branch refs/heads/master from Timothy Potter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=80801a2 ]

          SOLR-8145: Fix position of OOM killer script when starting Solr in the background

          Show
          jira-bot ASF subversion and git services added a comment - Commit 80801a2738c10ace30eca549fe5daadb88989c32 in lucene-solr's branch refs/heads/master from Timothy Potter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=80801a2 ] SOLR-8145 : Fix position of OOM killer script when starting Solr in the background
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e1033d965414b34b990070bb87c509364a7f4194 in lucene-solr's branch refs/heads/branch_6x from Timothy Potter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e1033d9 ]

          SOLR-8145: Fix position of OOM killer script when starting Solr in the background

          Show
          jira-bot ASF subversion and git services added a comment - Commit e1033d965414b34b990070bb87c509364a7f4194 in lucene-solr's branch refs/heads/branch_6x from Timothy Potter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e1033d9 ] SOLR-8145 : Fix position of OOM killer script when starting Solr in the background
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b17c57f072b65106f2689d2f9ea6a5ca14e492e0 in lucene-solr's branch refs/heads/master from Timothy Potter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b17c57f ]

          SOLR-8145: mention fix in solr/CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit b17c57f072b65106f2689d2f9ea6a5ca14e492e0 in lucene-solr's branch refs/heads/master from Timothy Potter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b17c57f ] SOLR-8145 : mention fix in solr/CHANGES.txt
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ddd019fac0d9eff352a4a17a62d9a9654f7bdc86 in lucene-solr's branch refs/heads/branch_6x from Timothy Potter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ddd019f ]

          SOLR-8145: mention fix in solr/CHANGES.txt

          Show
          jira-bot ASF subversion and git services added a comment - Commit ddd019fac0d9eff352a4a17a62d9a9654f7bdc86 in lucene-solr's branch refs/heads/branch_6x from Timothy Potter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ddd019f ] SOLR-8145 : mention fix in solr/CHANGES.txt
          Hide
          thelabdude Timothy Potter added a comment -

          Thanks for the patch Jurian Broertjes!

          Show
          thelabdude Timothy Potter added a comment - Thanks for the patch Jurian Broertjes !
          Hide
          dragonsinth Scott Blum added a comment -

          BTW: why is there no oomkiller when run in foreground?

          Show
          dragonsinth Scott Blum added a comment - BTW: why is there no oomkiller when run in foreground?
          Hide
          thelabdude Timothy Potter added a comment -

          BTW: why is there no oomkiller when run in foreground?

          I don't even think we should support a foreground mode as it doesn't seem to add any value to me and you have to pass a special flag to activate it. Are people running SolrCloud in production in foreground mode? The OOM killer is really so that you don't have a zombie node that OOM'd in your cluster. That said, if you think the OOM killer is needed in fg mode for some reason, then open a new JIRA (this one is done) and post a patch please.

          Show
          thelabdude Timothy Potter added a comment - BTW: why is there no oomkiller when run in foreground? I don't even think we should support a foreground mode as it doesn't seem to add any value to me and you have to pass a special flag to activate it. Are people running SolrCloud in production in foreground mode? The OOM killer is really so that you don't have a zombie node that OOM'd in your cluster. That said, if you think the OOM killer is needed in fg mode for some reason, then open a new JIRA (this one is done) and post a patch please.
          Hide
          elyograg Shawn Heisey added a comment -

          Are people running SolrCloud in production in foreground mode?

          I think this is commonly used when users create a service for Windows, but there isn't an OOM killer script for Windows.

          I think that the only time somebody would run in foreground mode on *NIX is when they are troubleshooting, mostly to see the console logging. I think that running the OOM killer even in foreground mode makes sense.

          A separate question (probably requiring a new issue): Regardless of foreground/background, should there be an environment variable that can be set to disable the OOM killer? Like foreground mode, this would be for troubleshooting.

          Show
          elyograg Shawn Heisey added a comment - Are people running SolrCloud in production in foreground mode? I think this is commonly used when users create a service for Windows, but there isn't an OOM killer script for Windows. I think that the only time somebody would run in foreground mode on *NIX is when they are troubleshooting, mostly to see the console logging. I think that running the OOM killer even in foreground mode makes sense. A separate question (probably requiring a new issue): Regardless of foreground/background, should there be an environment variable that can be set to disable the OOM killer? Like foreground mode, this would be for troubleshooting.
          Hide
          dragonsinth Scott Blum added a comment -

          We always run solr in foreground, because we handle app process management at a higher level. When we call bin/solr we already have a pidfile setup, stderr and stdout piped, etc-- so it's vital we run it in foreground to get the `exec` and maintain process identity. But we still need oom killer, so that it will die fast and our process watcher will restart it.

          Show
          dragonsinth Scott Blum added a comment - We always run solr in foreground, because we handle app process management at a higher level. When we call bin/solr we already have a pidfile setup, stderr and stdout piped, etc-- so it's vital we run it in foreground to get the `exec` and maintain process identity. But we still need oom killer, so that it will die fast and our process watcher will restart it.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit cc8af0d3e8e9bbd933947dae7307d0b09eb146da in lucene-solr's branch refs/heads/branch_6_0 from Timothy Potter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cc8af0d ]

          SOLR-8145: Fix position of OOM killer script when starting Solr in the background
          (cherry picked from commit e1033d9)

          Show
          jira-bot ASF subversion and git services added a comment - Commit cc8af0d3e8e9bbd933947dae7307d0b09eb146da in lucene-solr's branch refs/heads/branch_6_0 from Timothy Potter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cc8af0d ] SOLR-8145 : Fix position of OOM killer script when starting Solr in the background (cherry picked from commit e1033d9)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit b21927818c30ea2d3defab3d94c19225dc2452c8 in lucene-solr's branch refs/heads/branch_6_0 from Timothy Potter
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b219278 ]

          SOLR-8145: mention fix in solr/CHANGES.txt
          (cherry picked from commit ddd019f)

          Show
          jira-bot ASF subversion and git services added a comment - Commit b21927818c30ea2d3defab3d94c19225dc2452c8 in lucene-solr's branch refs/heads/branch_6_0 from Timothy Potter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b219278 ] SOLR-8145 : mention fix in solr/CHANGES.txt (cherry picked from commit ddd019f)
          Hide
          thelabdude Timothy Potter added a comment -

          btw - I was concerned I introduced this bug originally so wanted to track down the change that introduced this, looks like here -> https://github.com/apache/lucene-solr/commit/83969f44a0e9d4f282fa96f94870a039f9307287#diff-eb1341c1a8dc785e1d27d5a9d2c7a2e4

          Show
          thelabdude Timothy Potter added a comment - btw - I was concerned I introduced this bug originally so wanted to track down the change that introduced this, looks like here -> https://github.com/apache/lucene-solr/commit/83969f44a0e9d4f282fa96f94870a039f9307287#diff-eb1341c1a8dc785e1d27d5a9d2c7a2e4
          Hide
          elyograg Shawn Heisey added a comment -

          There's a bonus to this bugfix. It also removes the following warning logged by Jetty when Solr starts:

          WARNING: System properties and/or JVM args set.  Consider using --dry-run or --exec
          
          Show
          elyograg Shawn Heisey added a comment - There's a bonus to this bugfix. It also removes the following warning logged by Jetty when Solr starts: WARNING: System properties and/or JVM args set. Consider using --dry-run or --exec
          Hide
          nimlhug Nim Lhûg added a comment -

          Any chance this can be fixed in 5.5.1?

          Show
          nimlhug Nim Lhûg added a comment - Any chance this can be fixed in 5.5.1?
          Hide
          anshumg Anshum Gupta added a comment -

          backport for 5.5.1

          Show
          anshumg Anshum Gupta added a comment - backport for 5.5.1
          Hide
          anshumg Anshum Gupta added a comment -

          branch_5x:

          commit 9b00006773120cee43e762e254216ca03eafa75e
          Author: thelabdude <thelabdude@gmail.com>
          Date:   Wed Mar 2 11:22:27 2016 -0700
          
              SOLR-8145: Fix position of OOM killer script when starting Solr in the background
          

          branch_5_5

          commit 851a6029e889860951fdb480bf2d658c89639862
          Author: thelabdude <thelabdude@gmail.com>
          Date:   Wed Mar 2 11:22:27 2016 -0700
          
              SOLR-8145: Fix position of OOM killer script when starting Solr in the background
          
          Show
          anshumg Anshum Gupta added a comment - branch_5x: commit 9b00006773120cee43e762e254216ca03eafa75e Author: thelabdude <thelabdude@gmail.com> Date: Wed Mar 2 11:22:27 2016 -0700 SOLR-8145: Fix position of OOM killer script when starting Solr in the background branch_5_5 commit 851a6029e889860951fdb480bf2d658c89639862 Author: thelabdude <thelabdude@gmail.com> Date: Wed Mar 2 11:22:27 2016 -0700 SOLR-8145: Fix position of OOM killer script when starting Solr in the background

            People

            • Assignee:
              thelabdude Timothy Potter
              Reporter:
              nated Nate Dire
            • Votes:
              3 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development