Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: Server
    • Labels:
      None
    • Flags:
      Patch

      Description

      bin/solr doesn't work on IBM J9 because it sets -Xloggc flag, while J9 supports -Xverbosegclog. This prevents using bin/solr to start it on J9.

      1. solr-7748.patch
        4 kB
        Shai Erera
      2. SOLR-7748.patch
        0.5 kB
        Shai Erera
      3. SOLR-7748.patch
        6 kB
        Shai Erera
      4. SOLR-7748.patch
        6 kB
        Shai Erera
      5. SOLR-7748.patch
        6 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Patch fixes bin/solr.cmd since I tested it on Windows. If there are no objections to the change, I will change bin/solr too.

        Show
        Shai Erera added a comment - Patch fixes bin/solr.cmd since I tested it on Windows. If there are no objections to the change, I will change bin/solr too.
        Hide
        Shawn Heisey added a comment -

        There are severe bugs that happen in Lucene when using IBM's Java. I seem to recall seeing something indicating that IBM is starting to take them seriously, but that might have been wishful thinking.

        It will be quite a while after IBM fixes the problem (if they ever do) before there is widespread penetration of fixed Java versions, so until that happens, the fact that bin/solr doesn't work right might actually be a good thing.

        As I understand it, the J9 problems occur because IBM is extremely aggressive at turning on optimizations in the JVM. The performance of IBM's Java is legendary as a result of this, but it also causes a lot of problems.

        I know that in at least one case of a bug with J9, there is an optimization that can be turned off to fix the problem ... there may be other bugs fixed by turning off certain optimizations.

        As an initial step, I was thinking about having the startup script detect J9 versions and abort with a message indicating serious JVM bugs (perhaps linking to the JavaBugs page on the Lucene wiki). We already have detection for Java 7u40 through 7u51, which enables the -XX:-UseSuperWord option for Java and prints a warning to the user about upgrading Java.

        As we learn more, we could start with commandline options to work around the problems, and then if IBM ever actually fixes the problems, run normally with those detected versions.

        The java version detection currently happens in the script, which I think may be a little fragile. Perhaps a tiny little Java program could be written to detect a whole range of information about the JVM and return one or more known strings back to the script to tell the script what to do . Those actions might include aborting because the java version is not new enough, issuing a warning because it's not 64-bit, turning on X and/or Y commandline options, etc. We might even be able to set the max heap according to the available memory, but do so less aggressively than Java itself does.

        This comment turned into quite a lot of text! I started off just writing a quick note about J9 bugs.

        Show
        Shawn Heisey added a comment - There are severe bugs that happen in Lucene when using IBM's Java. I seem to recall seeing something indicating that IBM is starting to take them seriously, but that might have been wishful thinking. It will be quite a while after IBM fixes the problem (if they ever do) before there is widespread penetration of fixed Java versions, so until that happens, the fact that bin/solr doesn't work right might actually be a good thing. As I understand it, the J9 problems occur because IBM is extremely aggressive at turning on optimizations in the JVM. The performance of IBM's Java is legendary as a result of this, but it also causes a lot of problems. I know that in at least one case of a bug with J9, there is an optimization that can be turned off to fix the problem ... there may be other bugs fixed by turning off certain optimizations. As an initial step, I was thinking about having the startup script detect J9 versions and abort with a message indicating serious JVM bugs (perhaps linking to the JavaBugs page on the Lucene wiki). We already have detection for Java 7u40 through 7u51, which enables the -XX:-UseSuperWord option for Java and prints a warning to the user about upgrading Java. As we learn more, we could start with commandline options to work around the problems, and then if IBM ever actually fixes the problems, run normally with those detected versions. The java version detection currently happens in the script, which I think may be a little fragile. Perhaps a tiny little Java program could be written to detect a whole range of information about the JVM and return one or more known strings back to the script to tell the script what to do . Those actions might include aborting because the java version is not new enough, issuing a warning because it's not 64-bit, turning on X and/or Y commandline options, etc. We might even be able to set the max heap according to the available memory, but do so less aggressively than Java itself does. This comment turned into quite a lot of text! I started off just writing a quick note about J9 bugs.
        Hide
        Shai Erera added a comment -

        Thanks Shawn. I actually started to work with th j9 team over the past few weeks, on different aspects such as establishing a process for us to report bugs and have them rub Lucene/Solr tests on j9 builds in order to detect JVM issues. The team is also in contact with Uwe and Robert, and it seems like things get in the right direction.

        The issues that are known to cause corruption are suspected to have been fixed, but since they were not reproducible with the latest j9 builds, we can't say it for sure. Therfore the warnings are still found in the JavaBugs page, but I assume that after we have Jenkins passing builds for a while, we will at least document that these issues were not reproduced with version X and onwards.

        As for this particular issue I think that we should have bin/solr successfully start on j9. The problem that I've fixed is just passing the right gc log flag, which is different in j9.

        With this patch, we have a way to also detect particular build versions and block them if we know they are bad. I don't think the blocking j9 entirely is a good solution though, and definitely the script currently doesn't explicitly block j9, it just fails with a wrong flag passed error.

        From a community perspective, I feel that blocking a JVM vendor entirely is wrong, but maybe I'm biased. At least I can confirm that the team gives a high priority to resolving any outstanding Lucene/Solr issues, so blocking j9 in our scripts (and code) feels wrong.

        Show
        Shai Erera added a comment - Thanks Shawn. I actually started to work with th j9 team over the past few weeks, on different aspects such as establishing a process for us to report bugs and have them rub Lucene/Solr tests on j9 builds in order to detect JVM issues. The team is also in contact with Uwe and Robert, and it seems like things get in the right direction. The issues that are known to cause corruption are suspected to have been fixed, but since they were not reproducible with the latest j9 builds, we can't say it for sure. Therfore the warnings are still found in the JavaBugs page, but I assume that after we have Jenkins passing builds for a while, we will at least document that these issues were not reproduced with version X and onwards. As for this particular issue I think that we should have bin/solr successfully start on j9. The problem that I've fixed is just passing the right gc log flag, which is different in j9. With this patch, we have a way to also detect particular build versions and block them if we know they are bad. I don't think the blocking j9 entirely is a good solution though, and definitely the script currently doesn't explicitly block j9, it just fails with a wrong flag passed error. From a community perspective, I feel that blocking a JVM vendor entirely is wrong, but maybe I'm biased. At least I can confirm that the team gives a high priority to resolving any outstanding Lucene/Solr issues, so blocking j9 in our scripts (and code) feels wrong.
        Hide
        Hoss Man added a comment - - edited

        With this patch, we have a way to also detect particular build versions and block them if we know they are bad. I don't think the blocking j9 entirely is a good solution though, and definitely the script currently doesn't explicitly block j9, it just fails with a wrong flag passed error.

        +1

        wether the script should explicitly fail/warn "IBM J9 is known to have problems running Solr" on startup is a question that should be discussed/answered in it's own issue – it certainly makes sense to me to go ahead and fix the script to not fail in a weird and confusing way just because your're (edit) by the coincidence of trying to use J9.

        (by corollary: we IndexWriter tests that give up if you try running on J9 because we know those tests are unreliably on J9, but that doesn't mean the IndexWriter class itself throws a RuntimeException if you try to instantiate it on a J9 JVM)

        Show
        Hoss Man added a comment - - edited With this patch, we have a way to also detect particular build versions and block them if we know they are bad. I don't think the blocking j9 entirely is a good solution though, and definitely the script currently doesn't explicitly block j9, it just fails with a wrong flag passed error. +1 wether the script should explicitly fail/warn "IBM J9 is known to have problems running Solr" on startup is a question that should be discussed/answered in it's own issue – it certainly makes sense to me to go ahead and fix the script to not fail in a weird and confusing way just because your're (edit) by the coincidence of trying to use J9. (by corollary: we IndexWriter tests that give up if you try running on J9 because we know those tests are unreliably on J9, but that doesn't mean the IndexWriter class itself throws a RuntimeException if you try to instantiate it on a J9 JVM)
        Hide
        Shawn Heisey added a comment - - edited

        wether the script should explicitly fail/warn "IBM J9 is known to have problems running Solr" on startup is a question that should be discussed/answered in it's own issue – it certainly makes sense to me to go ahead and fix the script to not fail in a weird and confusing way just by the coincidence of trying to use J9.

        The issues that are known to cause corruption are suspected to have been fixed, but since they were not reproducible with the latest j9 builds, we can't say it for sure

        With this patch, we have a way to also detect particular build versions and block them if we know they are bad.

        Since we think IBM has fixed the problems, allowing J9 makes sense, and we can handle further version detection and resulting actions on another issue.

        Show
        Shawn Heisey added a comment - - edited wether the script should explicitly fail/warn "IBM J9 is known to have problems running Solr" on startup is a question that should be discussed/answered in it's own issue – it certainly makes sense to me to go ahead and fix the script to not fail in a weird and confusing way just by the coincidence of trying to use J9. The issues that are known to cause corruption are suspected to have been fixed, but since they were not reproducible with the latest j9 builds, we can't say it for sure With this patch, we have a way to also detect particular build versions and block them if we know they are bad. Since we think IBM has fixed the problems, allowing J9 makes sense, and we can handle further version detection and resulting actions on another issue.
        Hide
        Shai Erera added a comment -

        Patch fixes bin/solr too.

        Show
        Shai Erera added a comment - Patch fixes bin/solr too.
        Hide
        Shai Erera added a comment -

        Added CHANGES entry.

        I think it's ready, but will appreciate a second look at the fix of the scripts, since I'm no bash/batch expert .

        Show
        Shai Erera added a comment - Added CHANGES entry. I think it's ready, but will appreciate a second look at the fix of the scripts, since I'm no bash/batch expert .
        Hide
        Shawn Heisey added a comment -

        I'm going to assume that the way the vendor and version are detected are correct, since I don't have an IBM JVM to try.

        I did notice in the bash script that the non-IBM option construction included an echo command to include the existing GC_LOG_OPTS variable, but for J9, it sets the variable and doesn't include what's there previously. Should that be modified to match? The removed line (replaced with the if/else construct) uses parentheses to combine the old with the new, perhaps that method should be used in the two new statements. I'm not an expert either.

        Show
        Shawn Heisey added a comment - I'm going to assume that the way the vendor and version are detected are correct, since I don't have an IBM JVM to try. I did notice in the bash script that the non-IBM option construction included an echo command to include the existing GC_LOG_OPTS variable, but for J9, it sets the variable and doesn't include what's there previously. Should that be modified to match? The removed line (replaced with the if/else construct) uses parentheses to combine the old with the new, perhaps that method should be used in the two new statements. I'm not an expert either.
        Hide
        Shai Erera added a comment - - edited

        but for J9, it sets the variable and doesn't include what's there previousl

        Indeed, and as I put in the comment above that line, on the -Xverbosegclog is required/supported by J9, as far as I could tell from the J9 team. Also this page which they pointed me at does not indicate any additional parameters are needed: http://www-01.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.win.80.doc/diag/appendixes/cmdline/xverbosegclog.html.

        The removed line (replaced with the if/else construct) uses parentheses to combine the old with the new,

        You're right. At first I thought that the parentheses don't work because when I echoed them using echo $GC_LOG_OPTS it only printed the -verbose:gc. I now tried echo "${GC_LOG_OPTS[@]}" and it works. I'll restore that.

        Show
        Shai Erera added a comment - - edited but for J9, it sets the variable and doesn't include what's there previousl Indeed, and as I put in the comment above that line, on the -Xverbosegclog is required/supported by J9, as far as I could tell from the J9 team. Also this page which they pointed me at does not indicate any additional parameters are needed: http://www-01.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.win.80.doc/diag/appendixes/cmdline/xverbosegclog.html . The removed line (replaced with the if/else construct) uses parentheses to combine the old with the new, You're right. At first I thought that the parentheses don't work because when I echoed them using echo $GC_LOG_OPTS it only printed the -verbose:gc. I now tried echo "${GC_LOG_OPTS [@] }" and it works. I'll restore that.
        Hide
        Shai Erera added a comment -

        Patch addresses Shawn's comments. Shawn Heisey, I also verified that the other -XX flags are accepted by J9 (however might be ignored), so I now set all of them, and only modified the GC log filename flag.

        Show
        Shai Erera added a comment - Patch addresses Shawn's comments. Shawn Heisey , I also verified that the other -XX flags are accepted by J9 (however might be ignored), so I now set all of them, and only modified the GC log filename flag.
        Hide
        ASF subversion and git services added a comment -

        Commit 1689847 from Shai Erera in branch 'dev/trunk'
        [ https://svn.apache.org/r1689847 ]

        SOLR-7748: Fix bin/solr to work on IBM J9

        Show
        ASF subversion and git services added a comment - Commit 1689847 from Shai Erera in branch 'dev/trunk' [ https://svn.apache.org/r1689847 ] SOLR-7748 : Fix bin/solr to work on IBM J9
        Hide
        ASF subversion and git services added a comment -

        Commit 1689849 from Shai Erera in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1689849 ]

        SOLR-7748: Fix bin/solr to work on IBM J9

        Show
        ASF subversion and git services added a comment - Commit 1689849 from Shai Erera in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1689849 ] SOLR-7748 : Fix bin/solr to work on IBM J9
        Hide
        Shai Erera added a comment -

        Committed to trunk and 5x. Thanks Shawn Heisey for the review!

        Show
        Shai Erera added a comment - Committed to trunk and 5x. Thanks Shawn Heisey for the review!
        Hide
        Timothy Potter added a comment -

        This code is causing the script to fail for me:

        :resolve_java_vendor
        set "JAVA_VENDOR=Oracle"
        %JAVA% -version 2>&1 | findstr /i "IBM J9" > javares
        set /p JAVA_VENDOR_OUT=<javares
        del javares
        if NOT "%JAVA_VENDOR_OUT%" == "" (
          set "JAVA_VENDOR=IBM J9"
        )
        

        %JAVA% needs to be wrapped in double-quotes:

        :resolve_java_vendor
        set "JAVA_VENDOR=Oracle"
        "%JAVA%" -version 2>&1 | findstr /i "IBM J9" > javares
        set /p JAVA_VENDOR_OUT=<javares
        del javares
        if NOT "%JAVA_VENDOR_OUT%" == "" (
          set "JAVA_VENDOR=IBM J9"
        )
        
        Show
        Timothy Potter added a comment - This code is causing the script to fail for me: :resolve_java_vendor set "JAVA_VENDOR=Oracle" %JAVA% -version 2>&1 | findstr /i "IBM J9" > javares set /p JAVA_VENDOR_OUT=<javares del javares if NOT "%JAVA_VENDOR_OUT%" == "" ( set "JAVA_VENDOR=IBM J9" ) %JAVA% needs to be wrapped in double-quotes: :resolve_java_vendor set "JAVA_VENDOR=Oracle" "%JAVA%" -version 2>&1 | findstr /i "IBM J9" > javares set /p JAVA_VENDOR_OUT=<javares del javares if NOT "%JAVA_VENDOR_OUT%" == "" ( set "JAVA_VENDOR=IBM J9" )
        Hide
        Shai Erera added a comment - - edited

        Thanks Timothy Potter, that seems correct. I will try to upload a patch today or tomorrow, but feel free to upload one if you get to it before me.

        Show
        Shai Erera added a comment - - edited Thanks Timothy Potter , that seems correct. I will try to upload a patch today or tomorrow, but feel free to upload one if you get to it before me.
        Hide
        Shai Erera added a comment -

        I messed up the comments so I decided to take care of the fix now . I plan to commit shortly.

        Show
        Shai Erera added a comment - I messed up the comments so I decided to take care of the fix now . I plan to commit shortly.
        Hide
        ASF subversion and git services added a comment -

        Commit 1692317 from Shai Erera in branch 'dev/trunk'
        [ https://svn.apache.org/r1692317 ]

        SOLR-7748: Fix script to work when %JAVA% has spaces

        Show
        ASF subversion and git services added a comment - Commit 1692317 from Shai Erera in branch 'dev/trunk' [ https://svn.apache.org/r1692317 ] SOLR-7748 : Fix script to work when %JAVA% has spaces
        Hide
        ASF subversion and git services added a comment -

        Commit 1692318 from Shai Erera in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1692318 ]

        SOLR-7748: Fix script to work when %JAVA% has spaces

        Show
        ASF subversion and git services added a comment - Commit 1692318 from Shai Erera in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1692318 ] SOLR-7748 : Fix script to work when %JAVA% has spaces
        Hide
        Shai Erera added a comment -
        Show
        Shai Erera added a comment - Thanks Timothy Potter !
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development