Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Solr on windows does not currently have a script to kill the process on OOM errors.
      The idea is to write a batch script that works like the OOM kill script for Linux and kills the solr process on OOM errors while creating an OOM log file like the one on Linux systems.

      1. oom_win.cmd
        1 kB
        Binoy Dalal
      2. SOLR-8803.patch
        2 kB
        Binoy Dalal

        Activity

        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        Thanks for the prompt response Shawn.

        Yes, the change is a combination of the patch and the cmd script attached to the issue.
        And yes, Binoy Dalal is good. Thanks.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - Thanks for the prompt response Shawn. Yes, the change is a combination of the patch and the cmd script attached to the issue. And yes, Binoy Dalal is good. Thanks.
        Hide
        elyograg Shawn Heisey added a comment - - edited

        I'm assuming that the change is a combination of the patch and the cmd script attached to this issue.

        How do you want to be recognized in the CHANGES.txt file? Is "Binoy Dalal" appropriate?

        I will find some time for the commit, but if somebody else wants to jump on it before I can do that, feel free. If nobody has assigned the issue to themselves by the time I am available, I will.

        Show
        elyograg Shawn Heisey added a comment - - edited I'm assuming that the change is a combination of the patch and the cmd script attached to this issue. How do you want to be recognized in the CHANGES.txt file? Is "Binoy Dalal" appropriate? I will find some time for the commit, but if somebody else wants to jump on it before I can do that, feel free. If nobody has assigned the issue to themselves by the time I am available, I will.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        I've tested out this script the lucene-solr trunk and it works as expected.
        Is there any more action required with regards to this JIRA, or is it good to be committed.

        If it is good, can one of the committers please review it and commit it.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - I've tested out this script the lucene-solr trunk and it works as expected. Is there any more action required with regards to this JIRA, or is it good to be committed. If it is good, can one of the committers please review it and commit it.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        Hi Shawn,
        Does the patch and script look good?

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - Hi Shawn, Does the patch and script look good?
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        Finally figured out the problem. jmlucjav you were right. It was an issue with the path speecified to the oom script.
        I've fixed it and uploaded the working patch and oom script.
        I've tested it on Solr 5.3.1, 5.4.1 and 4.10.4.
        Works as expected on all the versions, although the configuration for 4.10.4 is a bit different from 5.x and the uploaded patch will only work for 5.x.
        If needed I'll upload another one for 4.x.

        Please review, test and advise.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - Finally figured out the problem. jmlucjav you were right. It was an issue with the path speecified to the oom script. I've fixed it and uploaded the working patch and oom script. I've tested it on Solr 5.3.1, 5.4.1 and 4.10.4. Works as expected on all the versions, although the configuration for 4.10.4 is a bit different from 5.x and the uploaded patch will only work for 5.x. If needed I'll upload another one for 4.x. Please review, test and advise.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        The flag is being picked up. A simple script that opens up a new command prompt does work.
        So there is something wrong with the script when being called from solr. I'll try and debug to see what I'm missing here.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - The flag is being picked up. A simple script that opens up a new command prompt does work. So there is something wrong with the script when being called from solr. I'll try and debug to see what I'm missing here.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        Tried on Solr 5.3.1, 5.4.1, 4.10.4
        Still facing the same issue, with relative and absolute path both specified in the start script.
        I think I'm doing something wrong since I've worked on 4.10.4 before (on Linux though) and that does call the OOM script and OOMEs.
        I'll keep digging.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - Tried on Solr 5.3.1, 5.4.1, 4.10.4 Still facing the same issue, with relative and absolute path both specified in the start script. I think I'm doing something wrong since I've worked on 4.10.4 before (on Linux though) and that does call the OOM script and OOMEs. I'll keep digging.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        I noticed that very silly mistake and corrected it. But still nothing. I'll put up the updated patch in some time.
        I've still to try on 4.10.4 so will update the thread with those results as well.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - I noticed that very silly mistake and corrected it. But still nothing. I'll put up the updated patch in some time. I've still to try on 4.10.4 so will update the thread with those results as well.
        Hide
        jmlucjav jmlucjav added a comment -

        set an absolute path to oom_win.bat maybe?

        Show
        jmlucjav jmlucjav added a comment - set an absolute path to oom_win.bat maybe?
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        That's not the issue. Here's the process string from windows using the command:

        wmic path win32_process where (Caption='java.exe') get commandline,processid | findstr start.jar | findstr 8983
        

        Here's the result:

        "C:\Program Files\Java\jdk1.8.0_11\bin\java"  -server -Xms512m -Xmx512m -Duser.timezone=UTC -XX:NewRatio=3  -XX:SurvivorRatio=4  -XX:TargetSurvivorRatio=90  -XX:MaxTenuringThreshold=8  -XX:+UseConcMarkSweepGC  -XX:+UseParNewGC  -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4  -XX:+CMSScavengeBeforeRemark  -XX:PretenureSizeThreshold=64m  -XX:+UseCMSInitiatingOccupancyOnly  -XX:CMSInitiatingOccupancyFraction=50  -XX:CMSMaxAbortablePrecleanTime=6000  -XX:+CMSParallelRemarkEnabled  -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime  -Xss256k -Xloggc:"E:\SOFTWARE\solr-5.4.1\server\logs"/solr_gc.log -XX:OnOutOfMemoryError="oom_win.bat 8983 E:\SOFTWARE\solr-5.4.1\server/logs" -Dlog4j.configuration="file:E:\SOFTWARE\solr-5.4.1\server\resources\log4j.properties" -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks     -Djetty.port=8983 -Dsolr.solr.home="E:\SOFTWARE\solr-5.4.1\server\solr" -Dsolr.install.dir="E:\SOFTWARE\solr-5.4.1" -Djetty.home="E:\SOFTWARE\solr-5.4.1\server" -Djava.io.tmpdir="E:\SOFTWARE\solr-5.4.1\server\tmp" -jar start.jar "--module=http"  4652       
        
        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - That's not the issue. Here's the process string from windows using the command: wmic path win32_process where (Caption='java.exe') get commandline,processid | findstr start.jar | findstr 8983 Here's the result: "C:\Program Files\Java\jdk1.8.0_11\bin\java" -server -Xms512m -Xmx512m -Duser.timezone=UTC -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xss256k -Xloggc: "E:\SOFTWARE\solr-5.4.1\server\logs" /solr_gc.log -XX:OnOutOfMemoryError= "oom_win.bat 8983 E:\SOFTWARE\solr-5.4.1\server/logs" -Dlog4j.configuration= "file:E:\SOFTWARE\solr-5.4.1\server\resources\log4j.properties" -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Djetty.port=8983 -Dsolr.solr.home= "E:\SOFTWARE\solr-5.4.1\server\solr" -Dsolr.install.dir= "E:\SOFTWARE\solr-5.4.1" -Djetty.home= "E:\SOFTWARE\solr-5.4.1\server" -Djava.io.tmpdir= "E:\SOFTWARE\solr-5.4.1\server\tmp" -jar start.jar "--module=http" 4652
        Hide
        elyograg Shawn Heisey added a comment -

        Binoy Dalal, can you show us your entire Solr commandline? Any chance it might be similar to SOLR-8145? I did take a look at your patch for the solr.cmd script, which looked like it put the argument in the right place.

        I also wonder if I broke it when I suggested that you change .bat to .cmd on the script.

        Show
        elyograg Shawn Heisey added a comment - Binoy Dalal , can you show us your entire Solr commandline? Any chance it might be similar to SOLR-8145 ? I did take a look at your patch for the solr.cmd script, which looked like it put the argument in the right place. I also wonder if I broke it when I suggested that you change .bat to .cmd on the script.
        Hide
        jmlucjav jmlucjav added a comment -

        most probably 5.3.1

        Show
        jmlucjav jmlucjav added a comment - most probably 5.3.1
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        You know what version that was?
        According to 8539 this issue started showing after the jetty upgrade to 9.2.3 I think.
        I'm going to try this with the last version on 4.x and with the trunk branch to see if I face the same issue

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - You know what version that was? According to 8539 this issue started showing after the jetty upgrade to 9.2.3 I think. I'm going to try this with the last version on 4.x and with the trunk branch to see if I face the same issue
        Hide
        jmlucjav jmlucjav added a comment -

        I meant that I got the flago working with Solr on a OOM. At least at some point it worked.

        Show
        jmlucjav jmlucjav added a comment - I meant that I got the flago working with Solr on a OOM. At least at some point it worked.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        I think this has something to do with this: SOLR-8539
        I haven't gone through the whole log but Solr does seem to be swallowing up OOMs because of some sort of a Jetty issue. Will take a better look at this in some time.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - I think this has something to do with this: SOLR-8539 I haven't gone through the whole log but Solr does seem to be swallowing up OOMs because of some sort of a Jetty issue. Will take a better look at this in some time.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        That is true. The flag does work. I've already tested it using a program (separate from Solr) that throws an OOM and calls the script to kill solr using the flag, where it works fine. I just can't seem to get Solr itself to OOM properly to test if it works on Solr itself.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - That is true. The flag does work. I've already tested it using a program (separate from Solr) that throws an OOM and calls the script to kill solr using the flag, where it works fine. I just can't seem to get Solr itself to OOM properly to test if it works on Solr itself.
        Hide
        jmlucjav jmlucjav added a comment -

        For the record, while looking into SOLR-8145 (when I did not know that jira existed), I saw -XX:OnOutOfMemoryError flag does work on Windows, I saw it working at least once.

        Show
        jmlucjav jmlucjav added a comment - For the record, while looking into SOLR-8145 (when I did not know that jira existed), I saw -XX:OnOutOfMemoryError flag does work on Windows, I saw it working at least once.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        Tried it. Script wasn't called at all.
        Any ideas?

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - Tried it. Script wasn't called at all. Any ideas?
        Hide
        elyograg Shawn Heisey added a comment -

        To try and debug, see if you can put something in the script that will write to a text file, just so you can see whether the OOM killer script is even being called. If you can write the date/time, that's even better.

        Show
        elyograg Shawn Heisey added a comment - To try and debug, see if you can put something in the script that will write to a text file, just so you can see whether the OOM killer script is even being called. If you can write the date/time, that's even better.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        I tried faceting on a field of 300k+ unique docs, and also a fuzzy query on the same field.
        Both threw an OOM: Java Heap Space exceptions but neither called the OOM script. Solr just displayed this message and then continued to work normally.

        I'm using Solr 5.4.1. Does it somehow handle OOM errors, or am I missing something here?

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - I tried faceting on a field of 300k+ unique docs, and also a fuzzy query on the same field. Both threw an OOM: Java Heap Space exceptions but neither called the OOM script. Solr just displayed this message and then continued to work normally. I'm using Solr 5.4.1. Does it somehow handle OOM errors, or am I missing something here?
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        Updated patch with cmd script instead of earlier bat script

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - Updated patch with cmd script instead of earlier bat script
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        cmd file also works as expected

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - cmd file also works as expected
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        The only difference between *.bat and *.cmd files I could find is the way in which the scripts return.
        For cmd files, certain commands like `set` set the return code while for bat files, the return code is only set on errors.
        So I do not think that changing the file extension will make any difference.
        I'll do that and run a small test to see if everything works fine and upload the updated files.

        Source for the difference info.: http://stackoverflow.com/questions/148968/windows-batch-files-bat-vs-cmd

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - The only difference between *.bat and *.cmd files I could find is the way in which the scripts return. For cmd files, certain commands like `set` set the return code while for bat files, the return code is only set on errors. So I do not think that changing the file extension will make any difference. I'll do that and run a small test to see if everything works fine and upload the updated files. Source for the difference info.: http://stackoverflow.com/questions/148968/windows-batch-files-bat-vs-cmd
        Hide
        elyograg Shawn Heisey added a comment -

        Looks reasonable to me. Would this work if we renamed the script to .cmd instead of .bat? Probably not a big deal either way, just curious. Although .bat will work, .cmd is more "correct" for Windows.

        Show
        elyograg Shawn Heisey added a comment - Looks reasonable to me. Would this work if we renamed the script to .cmd instead of .bat? Probably not a big deal either way, just curious. Although .bat will work, .cmd is more "correct" for Windows.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment -

        The oom_win.bat file is the OOM kill script for windows.
        The patch contains changes made to the solr.cmd script to add the -XX:OnOutOfMemoryError JVM options to the start command.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - The oom_win.bat file is the OOM kill script for windows. The patch contains changes made to the solr.cmd script to add the -XX:OnOutOfMemoryError JVM options to the start command.
        Hide
        binoydalal93@gmail.com Binoy Dalal added a comment - - edited

        I've written a batch script and tested it using a java program designed to throw an OOM exception upon which the script to kill the solr process is called. The script works as expected and is just a port of the shell version.
        I will be uploading the patch shortly.

        Show
        binoydalal93@gmail.com Binoy Dalal added a comment - - edited I've written a batch script and tested it using a java program designed to throw an OOM exception upon which the script to kill the solr process is called. The script works as expected and is just a port of the shell version. I will be uploading the patch shortly.

          People

          • Assignee:
            Unassigned
            Reporter:
            binoydalal93@gmail.com Binoy Dalal
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development