Solr
  1. Solr
  2. SOLR-6387

Solr specific work around for JDK bug #8047340: posix_spawn error with turkish locale

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10, 6.0
    • Component/s: None
    • Labels:
    • Environment:

      MacOSX, Solaris, BSD (POSIX in general)
      Running Oracle / OpenJDK prior to Java 8u40 and Java 7u80.

      Description

      Various versions of the Sun/Oracle/OpenJDK JVM have issues executing new processes if the default langauge of the JVM is "Turkish".

      The root bug reports of this affecting Runtime.exec() are here...

      On systems runining the affected JVMs, with a default langauge of "Turkish", this problem has historically manifested itself in Solr in a few ways:

      • SystemInfoHandler would throw nasty exceptions on these systems due to an attempt at conditionally executing some native process to check system stats
      • RunExecutableListener would fail cryptically
      • some solr tests involving either the SystemInfoHandler or the Hadoop MapReduce code would fail if the test framework randomly selected a turkish language based locale.

      Starting with Solr 4.10, We have worked around this jvm bug in Solr in 3 ways:

      • RunExecutableListener makes it more clear in the logs why it can't be used
      • SystemInfoHandler traps and ignores any Error related to "posix_span" in the same way it traps and ignores other errors related to it's conditional attempts at exec'ing (ie: permission problems, executable not found ,etc...)
      • our map reduce based tests that depend on exec'ing external processes now skip themselves automatically if a turkish local is randomly selected.

      Users affected by this issue who, for whatever reasons, can not upgrade to Solr 4.10, may wish to consider setting the "jdk.lang.Process.launchMechanism" system property explicitly (see below)

      original issue report

      Jenkin's tests occasionally fail with the following cryptic error...

      java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform.
              at __randomizedtesting.SeedInfo.seed([9219CAA3BCAA7365:7F07719937A772E1]:0)
              at java.lang.UNIXProcess$1.run(UNIXProcess.java:104)
              at java.lang.UNIXProcess$1.run(UNIXProcess.java:93)
              at java.security.AccessController.doPrivileged(Native Method)
              at java.lang.UNIXProcess.<clinit>(UNIXProcess.java:91)
              at java.lang.ProcessImpl.start(ProcessImpl.java:130)
              at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
              at java.lang.Runtime.exec(Runtime.java:617)
      

      A commonality of most of these failures is that the turkish locale has been randomly selected, and apparently the Runtime.exec is busted whtn you use turkish...

      http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8047340
      http://java.thedizzyheights.com/2014/07/java-error-posix_spawn-is-not-a-supported-process-launch-mechanism-on-this-platform-when-trying-to-spawn-a-process/

      We should consider hardcoding the "jdk.lang.Process.launchMechanism" sys property mentioned as a workarround in the jdk bug report

      1. SOLR-6387.patch
        6 kB
        Uwe Schindler
      2. SOLR-6387.patch
        6 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          It looks like the JDK itsself should use forbiddenapis.

          the JDK bug report is about Java 8, so does this affect Java 7, too?. In addition, what happens on Windows?

          Show
          Uwe Schindler added a comment - It looks like the JDK itsself should use forbiddenapis. the JDK bug report is about Java 8, so does this affect Java 7, too?. In addition, what happens on Windows?
          Hide
          Hoss Man added a comment -

          I'm not sure why that issue says it's a regression in 1.8 – we've seen it reproduce plenty of times in java7 jenkins builds.

          The blog seems to suggest the problem is specific to the BSD based impls of the JRE (ie: FreeBSD & MacOSX)

          I don't know what exactly the effects are of hardcoding "jdk.lang.Process.launchMechanism" are on other OSes (or for that matter: which of the 2 suggested values make the most sense for hardcoding it too: "POSIX_SPAWN" vs "fork"

          Show
          Hoss Man added a comment - I'm not sure why that issue says it's a regression in 1.8 – we've seen it reproduce plenty of times in java7 jenkins builds. The blog seems to suggest the problem is specific to the BSD based impls of the JRE (ie: FreeBSD & MacOSX) I don't know what exactly the effects are of hardcoding "jdk.lang.Process.launchMechanism" are on other OSes (or for that matter: which of the 2 suggested values make the most sense for hardcoding it too: "POSIX_SPAWN" vs "fork"
          Hide
          Uwe Schindler added a comment -

          I think, we should dig into OpenJDK source code to find out why why fails. To use a workaround, you have to understand, why the workaround is effective.

          Show
          Uwe Schindler added a comment - I think, we should dig into OpenJDK source code to find out why why fails. To use a workaround, you have to understand, why the workaround is effective.
          Hide
          Uwe Schindler added a comment -

          Hi,
          after thinking more about it:

          This seems to only apply to tests, so the simpliest approach should to disable random locales for the affected tests - we have other tests already doing this (add something like assumeFalse("Test disabled with turkish default locale, see bug http://...", Locale.getDefault().equals(Turkish)) to these tests). As we have a good direct line to Oracle we should better ask them to fix this bug, too. Setting this system property may have unwanted effects on specific platforms or implementation of different JVM vendors, so I strongly discourage doing this. It also does not help users of Solr, it just hides a real bug.

          The issue will also affect users on the turkish locale, if they use for example TIKA's ForkParser when parsing files in the extraction module. So it is better to document this as a known issue in specific JVMs and let the end-user upgrade or work around it.

          Show
          Uwe Schindler added a comment - Hi, after thinking more about it: This seems to only apply to tests, so the simpliest approach should to disable random locales for the affected tests - we have other tests already doing this (add something like assumeFalse("Test disabled with turkish default locale, see bug http:// ...", Locale.getDefault().equals(Turkish)) to these tests). As we have a good direct line to Oracle we should better ask them to fix this bug, too. Setting this system property may have unwanted effects on specific platforms or implementation of different JVM vendors, so I strongly discourage doing this. It also does not help users of Solr, it just hides a real bug. The issue will also affect users on the turkish locale, if they use for example TIKA's ForkParser when parsing files in the extraction module. So it is better to document this as a known issue in specific JVMs and let the end-user upgrade or work around it.
          Hide
          Uwe Schindler added a comment -

          Hi Hoss,
          the bug is already fixed in OpenJDK, to be released in 8u40: https://bugs.openjdk.java.net/browse/JDK-8047340

          So I think we should wait for the fix and until that just disable the test on Turkish locale with assumeTrue(). Which tests are affected by this (I have no idea where Solr spawns threads - I think only to get system information like free space or number of inodes?)

          Show
          Uwe Schindler added a comment - Hi Hoss, the bug is already fixed in OpenJDK, to be released in 8u40: https://bugs.openjdk.java.net/browse/JDK-8047340 So I think we should wait for the fix and until that just disable the test on Turkish locale with assumeTrue(). Which tests are affected by this (I have no idea where Solr spawns threads - I think only to get system information like free space or number of inodes?)
          Hide
          Uwe Schindler added a comment -

          I sent a note to Rory.

          Show
          Uwe Schindler added a comment - I sent a note to Rory.
          Hide
          Hoss Man added a comment -

          This seems to only apply to tests,...

          why does it only apply to tests?

          Any user running on a BSD machine with turkish configured as the default would also have this same type of problem in Solr wouldn't they?

          Setting this system property may have unwanted effects on specific platforms or implementation of different JVM vendors, so I strongly discourage doing this

          fair enough.

          Which tests are affected by this (I have no idea where Solr spawns threads - I think only to get system information like free space or number of inodes?)

          SystemInfoHandler uses exec for the reason you mentioned, but we've also seen this error pop up in LineRandomizerMapperReducerTest because some of the map reduce code execs other processes.

          SystemInfoHandler is general enough that lots of tests might trigger it (anyone with a mac or BSD machine should be able to generate a definitive list of all current test triggering this via "ant test -Dtests.locale=tr_TR")


          for the mapreduce tests, assumeNotTurkish() seems like an adequate work arround for this bug - but i feel like we should try to do something better in the case of SystemInfoHandler ... if not explicitly set jdk.lang.Process.launchMechanism, then what about catching the "Error" and if it matches the "posix_spawn" string, swallow & ignore it (we already have similar logic to account for other Exceptions that might occur when exec'ing a process that might not be available due to permissions/etc .. this seems like it would fall in that boat)

          Show
          Hoss Man added a comment - This seems to only apply to tests,... why does it only apply to tests? Any user running on a BSD machine with turkish configured as the default would also have this same type of problem in Solr wouldn't they? Setting this system property may have unwanted effects on specific platforms or implementation of different JVM vendors, so I strongly discourage doing this fair enough. Which tests are affected by this (I have no idea where Solr spawns threads - I think only to get system information like free space or number of inodes?) SystemInfoHandler uses exec for the reason you mentioned, but we've also seen this error pop up in LineRandomizerMapperReducerTest because some of the map reduce code execs other processes. SystemInfoHandler is general enough that lots of tests might trigger it (anyone with a mac or BSD machine should be able to generate a definitive list of all current test triggering this via "ant test -Dtests.locale=tr_TR") for the mapreduce tests, assumeNotTurkish() seems like an adequate work arround for this bug - but i feel like we should try to do something better in the case of SystemInfoHandler ... if not explicitly set jdk.lang.Process.launchMechanism, then what about catching the "Error" and if it matches the "posix_spawn" string, swallow & ignore it (we already have similar logic to account for other Exceptions that might occur when exec'ing a process that might not be available due to permissions/etc .. this seems like it would fall in that boat)
          Hide
          Uwe Schindler added a comment -

          if not explicitly set jdk.lang.Process.launchMechanism, then what about catching the "Error" and if it matches the "posix_spawn" string, swallow & ignore it (we already have similar logic to account for other Exceptions that might occur when exec'ing a process that might not be available due to permissions/etc .. this seems like it would fall in that boat)

          This looks like a good solution!

          Show
          Uwe Schindler added a comment - if not explicitly set jdk.lang.Process.launchMechanism, then what about catching the "Error" and if it matches the "posix_spawn" string, swallow & ignore it (we already have similar logic to account for other Exceptions that might occur when exec'ing a process that might not be available due to permissions/etc .. this seems like it would fall in that boat) This looks like a good solution!
          Hide
          Uwe Schindler added a comment - - edited

          Hi,

          from OpenJDK source code this shows all platforms where the bug occurs:

          LINUX(LaunchMechanism.VFORK, LaunchMechanism.FORK),
          BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
          SOLARIS(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK),
          AIX(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK);
          

          As you see, the capital "i" is included in the enum constant properties of BSD (Mac & FreeBSD) and SOLARIS, so the bug applies there (POSIX_SPAWN). Linux is not affected because it uses VFORK.

          Because of that complexity, which is very JVM specific, I tend to not hack those constants into the source code as Sysprops. So I would go with the "swallow" Exception in SystemInfoHandler and exclude turkish in mapreduce (I think there is already code that is disabled on some platforms like Windows).

          Show
          Uwe Schindler added a comment - - edited Hi, from OpenJDK source code this shows all platforms where the bug occurs: LINUX(LaunchMechanism.VFORK, LaunchMechanism.FORK), BSD(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK), SOLARIS(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK), AIX(LaunchMechanism.POSIX_SPAWN, LaunchMechanism.FORK); As you see, the capital "i" is included in the enum constant properties of BSD (Mac & FreeBSD) and SOLARIS, so the bug applies there (POSIX_SPAWN). Linux is not affected because it uses VFORK. Because of that complexity, which is very JVM specific, I tend to not hack those constants into the source code as Sysprops. So I would go with the "swallow" Exception in SystemInfoHandler and exclude turkish in mapreduce (I think there is already code that is disabled on some platforms like Windows).
          Hide
          Uwe Schindler added a comment -

          Here the patch. I checked the whole source code of Lucene/Solr for Runtime#exec(...).

          I also added an assume to morphlines and hadoop. I have to disable all test, because Hadoop always calls Runtime#exec() on startup to analyze system env. So users with hadoop should already know the problem

          Show
          Uwe Schindler added a comment - Here the patch. I checked the whole source code of Lucene/Solr for Runtime#exec(...). I also added an assume to morphlines and hadoop. I have to disable all test, because Hadoop always calls Runtime#exec() on startup to analyze system env. So users with hadoop should already know the problem
          Hide
          Uwe Schindler added a comment -
          Show
          Uwe Schindler added a comment - The bug for JDK7 is here: https://bugs.openjdk.java.net/browse/JDK-8055301
          Hide
          Hoss Man added a comment -

          patch looks pretty good, here's some coments (already discussed with Uwe on IRC but recording here for posterity)...

          • instead of ever refering directly to the java.net bug URL anywhere in our code, we should always direct users here for the full context (and a place we can update with more info)
          • SystemInfoHandler shouldn't rethrow the Error if it matches "posix_spawn" - it should just update the return String value to explain why there was an error executing the command, and then swallow the error
          Show
          Hoss Man added a comment - patch looks pretty good, here's some coments (already discussed with Uwe on IRC but recording here for posterity)... instead of ever refering directly to the java.net bug URL anywhere in our code, we should always direct users here for the full context (and a place we can update with more info) SystemInfoHandler shouldn't rethrow the Error if it matches "posix_spawn" - it should just update the return String value to explain why there was an error executing the command, and then swallow the error
          Hide
          Uwe Schindler added a comment -

          Updated patch with better error messages. SystemInfoHandler already swallowed the error in previous patch. It was only rethrown, if it was a different error (see return statement in the if block).

          Show
          Uwe Schindler added a comment - Updated patch with better error messages. SystemInfoHandler already swallowed the error in previous patch. It was only rethrown, if it was a different error (see return statement in the if block).
          Hide
          Uwe Schindler added a comment -

          Changed assumes, too.

          Show
          Uwe Schindler added a comment - Changed assumes, too.
          Hide
          ASF subversion and git services added a comment -

          Commit 1618672 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1618672 ]

          SOLR-6387: Add better error messages throughout Solr and supply a work around for Java bug #8047340 to SystemInfoHandler: On Turkish default locale, some JVMs fail to fork on MacOSX, BSD, AIX, and Solaris platforms.

          Show
          ASF subversion and git services added a comment - Commit 1618672 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1618672 ] SOLR-6387 : Add better error messages throughout Solr and supply a work around for Java bug #8047340 to SystemInfoHandler: On Turkish default locale, some JVMs fail to fork on MacOSX, BSD, AIX, and Solaris platforms.
          Hide
          ASF subversion and git services added a comment -

          Commit 1618676 from Uwe Schindler in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1618676 ]

          Merged revision(s) 1618672 from lucene/dev/trunk:
          SOLR-6387: Add better error messages throughout Solr and supply a work around for Java bug #8047340 to SystemInfoHandler: On Turkish default locale, some JVMs fail to fork on MacOSX, BSD, AIX, and Solaris platforms.

          Show
          ASF subversion and git services added a comment - Commit 1618676 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1618676 ] Merged revision(s) 1618672 from lucene/dev/trunk: SOLR-6387 : Add better error messages throughout Solr and supply a work around for Java bug #8047340 to SystemInfoHandler: On Turkish default locale, some JVMs fail to fork on MacOSX, BSD, AIX, and Solaris platforms.
          Hide
          Hoss Man added a comment -

          updated issue summary & description to be more helpful to people who follow the links in the new error/assume messages

          Show
          Hoss Man added a comment - updated issue summary & description to be more helpful to people who follow the links in the new error/assume messages
          Hide
          Uwe Schindler added a comment - - edited

          The problem is not completely fixed:
          On the first time, this correctly prints the warning, but as the Error occurs in the static initializer of UNIXProcess, the class cannot be loaded. Later references to this class then lead to NoClassDefFoundError. So we should catch both errors and log them.

          See http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1777/:

             [junit4]   2> 2082174 T4935 oasha.SystemInfoHandler.execute WARN Error forking command due to JVM locale bug (see https://issues.apache.org/jira/browse/SOLR-6387): posix_spawn is not a supported process launch mechanism on this platform.
             [junit4]   2> 2082176 T4935 oas.SolrTestCaseJ4.tearDown ###Ending testOverriddenHandlers
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=InfoHandlerTest -Dtests.method=testOverriddenHandlers -Dtests.seed=9F30A6DF04D6D3E8 -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=America/Danmarkshavn -Dtests.file.encoding=UTF-8
             [junit4] ERROR   0.10s | InfoHandlerTest.testOverriddenHandlers <<<
             [junit4]    > Throwable #1: java.lang.NoClassDefFoundError: Could not initialize class java.lang.UNIXProcess
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([9F30A6DF04D6D3E8:796BAAF8FD9F1C5C]:0)
          Show
          Uwe Schindler added a comment - - edited The problem is not completely fixed: On the first time, this correctly prints the warning, but as the Error occurs in the static initializer of UNIXProcess, the class cannot be loaded. Later references to this class then lead to NoClassDefFoundError. So we should catch both errors and log them. See http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1777/: [junit4] 2> 2082174 T4935 oasha.SystemInfoHandler.execute WARN Error forking command due to JVM locale bug (see https://issues.apache.org/jira/browse/SOLR-6387): posix_spawn is not a supported process launch mechanism on this platform. [junit4] 2> 2082176 T4935 oas.SolrTestCaseJ4.tearDown ###Ending testOverriddenHandlers [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=InfoHandlerTest -Dtests.method=testOverriddenHandlers -Dtests.seed=9F30A6DF04D6D3E8 -Dtests.slow=true -Dtests.locale=tr -Dtests.timezone=America/Danmarkshavn -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.10s | InfoHandlerTest.testOverriddenHandlers <<< [junit4] > Throwable #1: java.lang.NoClassDefFoundError: Could not initialize class java.lang.UNIXProcess [junit4] > at __randomizedtesting.SeedInfo.seed([9F30A6DF04D6D3E8:796BAAF8FD9F1C5C]:0)
          Hide
          ASF subversion and git services added a comment -

          Commit 1618938 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1618938 ]

          SOLR-6387: Try to fix this a second time...

          Show
          ASF subversion and git services added a comment - Commit 1618938 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1618938 ] SOLR-6387 : Try to fix this a second time...
          Hide
          ASF subversion and git services added a comment -

          Commit 1618939 from Uwe Schindler in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1618939 ]

          Merged revision(s) 1618938 from lucene/dev/trunk:
          SOLR-6387: Try to fix this a second time...

          Show
          ASF subversion and git services added a comment - Commit 1618939 from Uwe Schindler in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1618939 ] Merged revision(s) 1618938 from lucene/dev/trunk: SOLR-6387 : Try to fix this a second time...
          Hide
          Uwe Schindler added a comment -

          Next try, hopefully its fixed this time...

          Show
          Uwe Schindler added a comment - Next try, hopefully its fixed this time...
          Hide
          ASF subversion and git services added a comment -

          Commit 1620707 from hossman@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1620707 ]

          SOLR-6387: additional map-reduce test that does forking and needs 'tr' check

          Show
          ASF subversion and git services added a comment - Commit 1620707 from hossman@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1620707 ] SOLR-6387 : additional map-reduce test that does forking and needs 'tr' check
          Hide
          ASF subversion and git services added a comment -

          Commit 1620709 from hossman@apache.org in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1620709 ]

          SOLR-6387: additional map-reduce test that does forking and needs 'tr' check (merge r1620707)

          Show
          ASF subversion and git services added a comment - Commit 1620709 from hossman@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1620709 ] SOLR-6387 : additional map-reduce test that does forking and needs 'tr' check (merge r1620707)
          Hide
          ASF subversion and git services added a comment -

          Commit 1620712 from hossman@apache.org in branch 'dev/branches/lucene_solr_4_10'
          [ https://svn.apache.org/r1620712 ]

          SOLR-6387: additional map-reduce test that does forking and needs 'tr' check (merge r1620707)

          Show
          ASF subversion and git services added a comment - Commit 1620712 from hossman@apache.org in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1620712 ] SOLR-6387 : additional map-reduce test that does forking and needs 'tr' check (merge r1620707)
          Hide
          ASF subversion and git services added a comment -

          Commit 1653704 from Use account "steve_rowe" instead in branch 'dev/trunk'
          [ https://svn.apache.org/r1653704 ]

          SOLR-6991,SOLR-6387: Under Turkish locale, don't run solr-cell and dataimporthandler-extras tests that use Tika

          Show
          ASF subversion and git services added a comment - Commit 1653704 from Use account "steve_rowe" instead in branch 'dev/trunk' [ https://svn.apache.org/r1653704 ] SOLR-6991 , SOLR-6387 : Under Turkish locale, don't run solr-cell and dataimporthandler-extras tests that use Tika
          Hide
          ASF subversion and git services added a comment -

          Commit 1653706 from Use account "steve_rowe" instead in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1653706 ]

          SOLR-6991,SOLR-6387: Under Turkish locale, don't run solr-cell and dataimporthandler-extras tests that use Tika (merged trunk r1653704)

          Show
          ASF subversion and git services added a comment - Commit 1653706 from Use account "steve_rowe" instead in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1653706 ] SOLR-6991 , SOLR-6387 : Under Turkish locale, don't run solr-cell and dataimporthandler-extras tests that use Tika (merged trunk r1653704)
          Hide
          ASF subversion and git services added a comment -

          Commit 1653708 from Use account "steve_rowe" instead in branch 'dev/branches/lucene_solr_5_0'
          [ https://svn.apache.org/r1653708 ]

          SOLR-6991,SOLR-6387: Under Turkish locale, don't run solr-cell and dataimporthandler-extras tests that use Tika (merged trunk r1653704)

          Show
          ASF subversion and git services added a comment - Commit 1653708 from Use account "steve_rowe" instead in branch 'dev/branches/lucene_solr_5_0' [ https://svn.apache.org/r1653708 ] SOLR-6991 , SOLR-6387 : Under Turkish locale, don't run solr-cell and dataimporthandler-extras tests that use Tika (merged trunk r1653704)
          Hide
          Uwe Schindler added a comment -

          contrib/extraction is also affected by this.

          News about this: According to Oracle, this should be fixed in Java 8u40 and Java 7u80.

          Show
          Uwe Schindler added a comment - contrib/extraction is also affected by this. News about this: According to Oracle, this should be fixed in Java 8u40 and Java 7u80.

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development