Lucene - Core
  1. Lucene - Core
  2. LUCENE-3335

jrebug causes porter stemmer to sigsegv

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.9, 1.9.1, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4, 2.4.1, 2.9, 2.9.1, 2.9.2, 2.9.3, 2.9.4, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2, 3.3, 3.4, 4.0-ALPHA
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Environment:
      • JDK 7 Preview Release, GA (fixed in JDK 1.7.0_1)
      • JDK 1.6.0_20+ with -XX:+OptimizeStringConcat or -XX:+AggressiveOpts (fixed in JDK 1.6.0_29)
    • Lucene Fields:
      New

      Description

      happens easily on java7: ant test -Dtestcase=TestPorterStemFilter -Dtests.iter=100

      might happen on 1.6.0_u26 too, a user reported something that looks like the same bug already:
      http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm

      1. patch-0uwe.patch
        25 kB
        Uwe Schindler
      2. LUCENE-3335.patch
        0.8 kB
        Robert Muir
      3. LUCENE-3335_slow.patch
        1 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Krystian Nowak added a comment -
          Show
          Krystian Nowak added a comment - Seems like finally done in 7u2: http://www.oracle.com/technetwork/java/javase/2col/7u2bugfixes-1394661.html
          Show
          Uwe Schindler added a comment - See LUCENE-3537 and also the Lucene/Solr web homepage. A complete report is here: Main article Explanation of the string concat issues, this explains why StringConcat optimizations trigger this Discussion about the update release
          Hide
          Matt Ryall added a comment -

          The Java 7u1 release notes report that this issue is fixed in that release:

          JIT and Loop Bugs

          Three bugs reported by various parties, including Apache Lucene developers, have been fixed in JDK 7 Update 1, in addition to a fourth related bug found by Oracle (7070134, 7068051, 7044738, 7077439).

          I haven't yet been able to verify this.

          Show
          Matt Ryall added a comment - The Java 7u1 release notes report that this issue is fixed in that release: JIT and Loop Bugs Three bugs reported by various parties, including Apache Lucene developers, have been fixed in JDK 7 Update 1, in addition to a fourth related bug found by Oracle (7070134, 7068051, 7044738, 7077439). I haven't yet been able to verify this.
          Hide
          Robert Muir added a comment -

          I don't think there is any sense in this, who cares?

          We reported this crash to Oracle in plenty of time, and the worse wrong-results bug has been open since May 13: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738, but Oracle decided not to fix that, too.

          Show
          Robert Muir added a comment - I don't think there is any sense in this, who cares? We reported this crash to Oracle in plenty of time, and the worse wrong-results bug has been open since May 13: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738 , but Oracle decided not to fix that, too.
          Hide
          Uwe Schindler added a comment -

          @Shay: Sorry I did not want to be too italian I just wanted to ensure that such configurations, leading to bugs in JVMs, would be reported to us. It would help us to also respond quicker on such bug reports, like the one we already got 2 months ago (which nobody was able to reproduce, as we did not know that the user used aggressive opts).

          Show
          Uwe Schindler added a comment - @Shay: Sorry I did not want to be too italian I just wanted to ensure that such configurations, leading to bugs in JVMs, would be reported to us. It would help us to also respond quicker on such bug reports, like the one we already got 2 months ago (which nobody was able to reproduce, as we did not know that the user used aggressive opts).
          Hide
          Dawid Weiss added a comment -

          Uwe has an Italian temper Btw. I really like the recent Yoda-discussion on concurrency-interest, Shay...

          Show
          Dawid Weiss added a comment - Uwe has an Italian temper Btw. I really like the recent Yoda-discussion on concurrency-interest, Shay...
          Hide
          Shay Banon added a comment -

          @Uwe I actually forgot about this, and did not think it was because of the porter stemmer at the time, especially since I did try and reproduce it and never managed to (I thought it was coincidence it crashed there). From my experience, you get very little help from sun/oracle when using unorthodox flags like agressive opts without proper recreation. Well, you get very little help there even when you do produce recreation... (see this issue that I opened for example: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129) . I am the reason behind Lucene 1.9.1 release with the major bug in buffering introduced in 1.9 way back in the days, do you really think I would not contact if I thought there really was a problem associated with Lucene?

          Show
          Shay Banon added a comment - @Uwe I actually forgot about this, and did not think it was because of the porter stemmer at the time, especially since I did try and reproduce it and never managed to (I thought it was coincidence it crashed there). From my experience, you get very little help from sun/oracle when using unorthodox flags like agressive opts without proper recreation. Well, you get very little help there even when you do produce recreation... (see this issue that I opened for example: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129 ) . I am the reason behind Lucene 1.9.1 release with the major bug in buffering introduced in 1.9 way back in the days, do you really think I would not contact if I thought there really was a problem associated with Lucene?
          Hide
          Uwe Schindler added a comment -

          The SIGSEGV bug was already reported on the Elastic Search mailing list in January: http://elasticsearch-users.115913.n3.nabble.com/Java-6u23-and-ES-0-14-2-crashing-on-signal-6-SIGABT-td2289578.html

          It would have been nice, if Shay Bannon contacted us!

          Show
          Uwe Schindler added a comment - The SIGSEGV bug was already reported on the Elastic Search mailing list in January: http://elasticsearch-users.115913.n3.nabble.com/Java-6u23-and-ES-0-14-2-crashing-on-signal-6-SIGABT-td2289578.html It would have been nice, if Shay Bannon contacted us!
          Hide
          Bernd Fehling added a comment -

          > I got new information from Vladimir about the Porter bug in Java 1.6: "The code in memnode.cpp was there
          > for long time (before 6u26). But before my changes it was guarded by OptimizeStringConcat flag. So if you
          > use -XX:+OptimizeStringConcat or -XX:+AggressiveOpts flags you will hit the same problem (I reproduced it
          > even with 1.6.0_23)"
          >
          > This might be the reason behind http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm,
          > but we never got a response. If he used aggressive opts he has the same problem.

          @Uwe, sorry for not answering that one or creating an issue as Robert said, but while switching from FAST Search to Lucene/Solr I had (and still have) several problems to solve. One was the UTF-8 jetty problem, then this PorterStemFilter came up and right after that Solr/Lucene crashes with OOM due to FieldCache problems. And there is still my plan to get FST for synonyms running. Dang, my day only has 24 hours.
          Yes, I used -XX:+AggressiveOpts and as we know now thats the cause why JVM chrashed.
          java version "1.6.0_22"
          Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
          Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
          After the crashes with PorterStemFilter I removed AggressiveOpts from my JAVA_OPTS.
          Now I'm watching what Lucenes FieldCache is doing and if its still doubling its size until OOM.
          So I'm deep inside

          Well interesting idea to know that if I had filed an issue and that that one had been traced down a month ago that this might have been prevented a buggy release of java 1.7

          Show
          Bernd Fehling added a comment - > I got new information from Vladimir about the Porter bug in Java 1.6: "The code in memnode.cpp was there > for long time (before 6u26). But before my changes it was guarded by OptimizeStringConcat flag. So if you > use -XX:+OptimizeStringConcat or -XX:+AggressiveOpts flags you will hit the same problem (I reproduced it > even with 1.6.0_23)" > > This might be the reason behind http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm , > but we never got a response. If he used aggressive opts he has the same problem. @Uwe, sorry for not answering that one or creating an issue as Robert said, but while switching from FAST Search to Lucene/Solr I had (and still have) several problems to solve. One was the UTF-8 jetty problem, then this PorterStemFilter came up and right after that Solr/Lucene crashes with OOM due to FieldCache problems. And there is still my plan to get FST for synonyms running. Dang, my day only has 24 hours. Yes, I used -XX:+AggressiveOpts and as we know now thats the cause why JVM chrashed. java version "1.6.0_22" Java(TM) SE Runtime Environment (build 1.6.0_22-b04) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode) After the crashes with PorterStemFilter I removed AggressiveOpts from my JAVA_OPTS. Now I'm watching what Lucenes FieldCache is doing and if its still doubling its size until OOM. So I'm deep inside Well interesting idea to know that if I had filed an issue and that that one had been traced down a month ago that this might have been prevented a buggy release of java 1.7
          Hide
          Robert Muir added a comment -

          I opened a separate issue for the checkindex problem: LUCENE-3346

          it only affects pulsing, and only unreleased trunk.

          Show
          Robert Muir added a comment - I opened a separate issue for the checkindex problem: LUCENE-3346 it only affects pulsing, and only unreleased trunk.
          Hide
          Uwe Schindler added a comment - - edited

          I verified the above:

          modules\analysis\common>ant test -Dargs="-XX:+OptimizeStringConcat" -Dtestcase=TestPorterStemFilter -Dtests.iter=100
          
          modules\analysis\common>ant test -Dargs="-XX:+AggressiveOpts" -Dtestcase=TestPorterStemFilter -Dtests.iter=100
          

          Both crash with the same error:

          Testsuite: org.apache.lucene.analysis.en.TestPorterStemFilter
          #
          # A fatal error has been detected by the Java Runtime Environment:
          #
          #  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000022171e9, pid=8816, tid=10952
          #
          # JRE version: 6.0_24-b07
          # Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode windows-amd64 compressed oops)
          # Problematic frame:
          # J  org.apache.lucene.analysis.en.PorterStemmer.stem(I)Z
          #
          # An error report file with more information is saved as:
          # C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\modules\analysis\build\common\test\1\hs_err_pid8816.log
          #
          # If you would like to submit a bug report, please visit:
          #   http://java.sun.com/webapps/bugreport/crash.jsp
          #
          Test org.apache.lucene.analysis.en.TestPorterStemFilter FAILED (crashed)
          

          This may explain Bernd Fehlings problems, as Solr users often use strange JVM options because they want to get all speed out of their system (because solr is slower than native Lucene code...).

          Show
          Uwe Schindler added a comment - - edited I verified the above: modules\analysis\common>ant test -Dargs="-XX:+OptimizeStringConcat" -Dtestcase=TestPorterStemFilter -Dtests.iter=100 modules\analysis\common>ant test -Dargs="-XX:+AggressiveOpts" -Dtestcase=TestPorterStemFilter -Dtests.iter=100 Both crash with the same error: Testsuite: org.apache.lucene.analysis.en.TestPorterStemFilter # # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000022171e9, pid=8816, tid=10952 # # JRE version: 6.0_24-b07 # Java VM: Java HotSpot(TM) 64-Bit Server VM (19.1-b02 mixed mode windows-amd64 compressed oops) # Problematic frame: # J org.apache.lucene.analysis.en.PorterStemmer.stem(I)Z # # An error report file with more information is saved as: # C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\modules\analysis\build\common\test\1\hs_err_pid8816.log # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # Test org.apache.lucene.analysis.en.TestPorterStemFilter FAILED (crashed) This may explain Bernd Fehlings problems, as Solr users often use strange JVM options because they want to get all speed out of their system (because solr is slower than native Lucene code...).
          Hide
          Uwe Schindler added a comment -

          I got new information from Vladimir about the Porter bug in Java 1.6: "The code in memnode.cpp was there for long time (before 6u26). But before my changes it was guarded by OptimizeStringConcat flag. So if you use -XX:+OptimizeStringConcat or -XX:+AggressiveOpts flags you will hit the same problem (I reproduced it even with 1.6.0_23)"

          This might be the reason behind http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm, but we never got a response. If he used aggressive opts he has the same problem.

          Show
          Uwe Schindler added a comment - I got new information from Vladimir about the Porter bug in Java 1.6: "The code in memnode.cpp was there for long time (before 6u26). But before my changes it was guarded by OptimizeStringConcat flag. So if you use -XX:+OptimizeStringConcat or -XX:+AggressiveOpts flags you will hit the same problem (I reproduced it even with 1.6.0_23)" This might be the reason behind http://www.lucidimagination.com/search/document/3beaa082c4d2fdd4/porterstemfilter_kills_jvm , but we never got a response. If he used aggressive opts he has the same problem.
          Hide
          Dawid Weiss added a comment -

          @Hoss Yeah, it's scary, isn't it? But then: there is no piece of software that is 100% bug free and anybody running a production server will be running migration tests first before running on a new infrastructure. Hey, that's also part of the reason we still have folks running 1.5

          I think I'm for releasing 1.7 and getting the road paved for bugfix releases rather than delaying it indefinitely... I mean: it'll be motivational for Oracle if people start screaming!

          Show
          Dawid Weiss added a comment - @Hoss Yeah, it's scary, isn't it? But then: there is no piece of software that is 100% bug free and anybody running a production server will be running migration tests first before running on a new infrastructure. Hey, that's also part of the reason we still have folks running 1.5 I think I'm for releasing 1.7 and getting the road paved for bugfix releases rather than delaying it indefinitely... I mean: it'll be motivational for Oracle if people start screaming!
          Hide
          Robert Muir added a comment -

          Even if we found a work around for all the affected issues in Lucene that didn't hurt performance in older JVMs, and spun up a 3.3.1 RC in the next 5 minutes, we still don't have enough time to vote for that release and get it out to the mirrors by the time Java 7 comes out – let alone have any confidence that all our users will upgrade Lucene/Solr before they upgrade their JVM.

          I agree, I'm not implying we should rush anything. But I guess I'm saying its worth it to understand the scope of what's affected, because if its just:

          • PorterStemmer jrecrash <- workarounds already posted here
          • Pulsing negative readVint <-- no workaround yet.

          well, thats manageable, only one of these affects any released code.

          Show
          Robert Muir added a comment - Even if we found a work around for all the affected issues in Lucene that didn't hurt performance in older JVMs, and spun up a 3.3.1 RC in the next 5 minutes, we still don't have enough time to vote for that release and get it out to the mirrors by the time Java 7 comes out – let alone have any confidence that all our users will upgrade Lucene/Solr before they upgrade their JVM. I agree, I'm not implying we should rush anything. But I guess I'm saying its worth it to understand the scope of what's affected, because if its just: PorterStemmer jrecrash <- workarounds already posted here Pulsing negative readVint <-- no workaround yet. well, thats manageable, only one of these affects any released code.
          Hide
          Robert Muir added a comment -

          I just wrote a test (Test10KPulsings) designed to seek out the corrupt index bug.

          it didnt work, but it separately sometimes creates a corrupt index with java6

          Adding lucene/src/test/org/apache/lucene/index/codecs/pulsing
          Adding lucene/src/test/org/apache/lucene/index/codecs/pulsing/Test10KPulsings.java
          Transmitting file data .
          Committed revision 1151335.

          Show
          Robert Muir added a comment - I just wrote a test (Test10KPulsings) designed to seek out the corrupt index bug. it didnt work, but it separately sometimes creates a corrupt index with java6 Adding lucene/src/test/org/apache/lucene/index/codecs/pulsing Adding lucene/src/test/org/apache/lucene/index/codecs/pulsing/Test10KPulsings.java Transmitting file data . Committed revision 1151335.
          Hide
          Hoss Man added a comment -

          Frankly i'm amazed that the jdk7 guys are saying "yes this is a bug that can cause a sigsegv in code that worked fine using Java 1.6, but we're going to go ahead and release 1.7 with this bug in place anyway, it should make it in by 1.7_u2"

          makes me scared shitless of what other known bugs will be in Java 1.7.0.

          Even if we found a work around for all the affected issues in Lucene that didn't hurt performance in older JVMs, and spun up a 3.3.1 RC in the next 5 minutes, we still don't have enough time to vote for that release and get it out to the mirrors by the time Java 7 comes out – let alone have any confidence that all our users will upgrade Lucene/Solr before they upgrade their JVM.

          I think the most important thing we can do is publicize the shit out of this hotspot bug, and warn everybody on the fucking planet not to use Java1.7.0 because of it.

          if we also find clean workarounds we can commit and release in our own code, so be it – but that seems like priority #2

          Show
          Hoss Man added a comment - Frankly i'm amazed that the jdk7 guys are saying "yes this is a bug that can cause a sigsegv in code that worked fine using Java 1.6, but we're going to go ahead and release 1.7 with this bug in place anyway, it should make it in by 1.7_u2" makes me scared shitless of what other known bugs will be in Java 1.7.0. Even if we found a work around for all the affected issues in Lucene that didn't hurt performance in older JVMs, and spun up a 3.3.1 RC in the next 5 minutes, we still don't have enough time to vote for that release and get it out to the mirrors by the time Java 7 comes out – let alone have any confidence that all our users will upgrade Lucene/Solr before they upgrade their JVM. I think the most important thing we can do is publicize the shit out of this hotspot bug, and warn everybody on the fucking planet not to use Java1.7.0 because of it. if we also find clean workarounds we can commit and release in our own code, so be it – but that seems like priority #2
          Hide
          Robert Muir added a comment -

          Should we place a warning on the "Download" and "News" page on Solr and Lucene website? The risk is high that you corrupt your index, if you index using these JDK versions.

          Not totally sure, the issue is not so different from LUCENE-2975: if we can we make a easy workaround I think (there are 2 possible ones on this issue for the Porter bug), we give it our best try, and we get it out in a release. this way if someone has to support jdk 7, we can at least say, upgrade to this version of lucene rather than "won't fix". No matter how much we scream, users will be confused because it seems these bugs only affect loops of a very specific form.

          On the other hand if it makes our code messy or confusing or slows things down, we should not do this.

          I will look into this new negative vint bug, it might only affect pulsing, and see if i can make a test case+workaround for it.

          Show
          Robert Muir added a comment - Should we place a warning on the "Download" and "News" page on Solr and Lucene website? The risk is high that you corrupt your index, if you index using these JDK versions. Not totally sure, the issue is not so different from LUCENE-2975 : if we can we make a easy workaround I think (there are 2 possible ones on this issue for the Porter bug), we give it our best try, and we get it out in a release. this way if someone has to support jdk 7, we can at least say, upgrade to this version of lucene rather than "won't fix". No matter how much we scream, users will be confused because it seems these bugs only affect loops of a very specific form. On the other hand if it makes our code messy or confusing or slows things down, we should not do this. I will look into this new negative vint bug, it might only affect pulsing, and see if i can make a test case+workaround for it.
          Hide
          Uwe Schindler added a comment -

          Here the final patch for OpenJDK including Porter.java as testcase:

          For the full bugfix, also the following fixes are needed:

          All three were applied to Jenkins' OpenJDK7 (excluding the testcases).

          Show
          Uwe Schindler added a comment - Here the final patch for OpenJDK including Porter.java as testcase: http://cr.openjdk.java.net/~kvn/7070134/webrev/7070134.patch (see also http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-July/005972.html , http://cr.openjdk.java.net/~kvn/7070134/webrev/ ) For the full bugfix, also the following fixes are needed: http://cr.openjdk.java.net/~kvn/7044738/webrev/7044738.patch http://cr.openjdk.java.net/~kvn/7068051/webrev/7068051.patch All three were applied to Jenkins' OpenJDK7 (excluding the testcases).
          Show
          Uwe Schindler added a comment - Link to the message: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-July/005971.html
          Hide
          Uwe Schindler added a comment -

          Response from the Hotspot mailing list about their release plans:

          Thank you, Uwe

          I will send the patch for reviews shortly.

          About java 7 release. We are late to do any bugs fixes in GA which should happen soon. All loop optimization fixes will go definitely into jdk7 update 2. We will try to push them into update 1 (which is targeted only for security fixes) but we can't promise.

          There is going discussion about using current Hotspot VM in future jdk6 updates but there is no decision yet. Note: current Hotspot VM sources are targeted for
          JDK8 and jdk7 updates only.

          Regards,
          Vladimir

          This means, Java 7 will come out with heavy broken loops (so almost for any for or while loop you cannot make sure that it is still working correct when executed 10thousand times.

          What do others mean. Should we place a warning on the "Download" and "News" page on Solr and Lucene website? The risk is high that you corrupt your index, if you index using these JDK versions. Also the default configuration of Solr will SIGSEGV.
          We should also inform the user mailing lists.

          I can prepare something and we can discuss? Oracle JDK 1.7.0 GA will be released on July 28th, according to Oracle's press releases. At least on that day we should have something available to present to the users.

          Show
          Uwe Schindler added a comment - Response from the Hotspot mailing list about their release plans: Thank you, Uwe I will send the patch for reviews shortly. About java 7 release. We are late to do any bugs fixes in GA which should happen soon. All loop optimization fixes will go definitely into jdk7 update 2. We will try to push them into update 1 (which is targeted only for security fixes) but we can't promise. There is going discussion about using current Hotspot VM in future jdk6 updates but there is no decision yet. Note: current Hotspot VM sources are targeted for JDK8 and jdk7 updates only. Regards, Vladimir This means, Java 7 will come out with heavy broken loops (so almost for any for or while loop you cannot make sure that it is still working correct when executed 10thousand times. What do others mean. Should we place a warning on the "Download" and "News" page on Solr and Lucene website? The risk is high that you corrupt your index, if you index using these JDK versions. Also the default configuration of Solr will SIGSEGV. We should also inform the user mailing lists. I can prepare something and we can discuss? Oracle JDK 1.7.0 GA will be released on July 28th, according to Oracle's press releases. At least on that day we should have something available to present to the users.
          Hide
          Uwe Schindler added a comment -

          Patch again, without Apache License

          Show
          Uwe Schindler added a comment - Patch again, without Apache License
          Hide
          Uwe Schindler added a comment -

          Hi,
          we had some success with direct communication to the hotspot developers.

          The whole story:

          • Java 7 contains a fix to the readVInt issue since 1.6.0_21 (approx, LUCENE-2975), this fix was fortunately not included in 1.6.0_26
          • This fix causes the SIGSEGV on Porter code, but also breaks other loops (e.g. a strange CheckIndex failure in org.apache.lucene.facet.search.SamplingWrapperTest)
          • We had contact to the hotspot-compiler-dev list and Vladimir sent me the patches, that should fix the bug. The attached patch is a combination of all patches received, in a format suitable for the FreeBSD ports build framework. Place the file in your port's "files/" folder and rebuild the package. In Debian/Ubuntu you should be able to do the same thing by placing the file in the debian/patches folder somehow.
          • I have now disabled all jenkins builds and queued the Java 7 builds for 3.x and trunk quarter-hourly. The machine now stress tests.
          • We will report the resuls back to Oracle, but it seems that the attached patch fixes the issues.

          If they would have added their original broken fix to the 1.6.0_26 release it would have been catastrophic...

          Show
          Uwe Schindler added a comment - Hi, we had some success with direct communication to the hotspot developers. The whole story: Java 7 contains a fix to the readVInt issue since 1.6.0_21 (approx, LUCENE-2975 ), this fix was fortunately not included in 1.6.0_26 This fix causes the SIGSEGV on Porter code, but also breaks other loops (e.g. a strange CheckIndex failure in org.apache.lucene.facet.search.SamplingWrapperTest) We had contact to the hotspot-compiler-dev list and Vladimir sent me the patches, that should fix the bug. The attached patch is a combination of all patches received, in a format suitable for the FreeBSD ports build framework. Place the file in your port's "files/" folder and rebuild the package. In Debian/Ubuntu you should be able to do the same thing by placing the file in the debian/patches folder somehow. I have now disabled all jenkins builds and queued the Java 7 builds for 3.x and trunk quarter-hourly. The machine now stress tests. We will report the resuls back to Oracle, but it seems that the attached patch fixes the issues. If they would have added their original broken fix to the 1.6.0_26 release it would have been catastrophic...
          Hide
          Hoss Man added a comment -

          Thread: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-July/005962.html

          Suggested workarround: -XX:-UseLoopPredicate

          Show
          Hoss Man added a comment - Thread: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2011-July/005962.html Suggested workarround: -XX:-UseLoopPredicate
          Hide
          Dawid Weiss added a comment -

          fyi: I asked the gods of jit on hotspot-dev mailing list what's the cause of this.

          Show
          Dawid Weiss added a comment - fyi: I asked the gods of jit on hotspot-dev mailing list what's the cause of this.
          Hide
          Robert Muir added a comment -

          The bug is now visible at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134

          If anyone has a few minutes, it would be cool if they voted on it (the oracle site is horrendously slow, and i know thats discouraging).

          I think there will be a lot of confusion if java 7 is released with this bug, for instance simple things like the solr example will not really work at all.
          you don't need some crazy random test to trigger this, once this method passes the compile threshold, (e.g. 10k invocations) then boom.

          Show
          Robert Muir added a comment - The bug is now visible at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134 If anyone has a few minutes, it would be cool if they voted on it (the oracle site is horrendously slow, and i know thats discouraging). I think there will be a lot of confusion if java 7 is released with this bug, for instance simple things like the solr example will not really work at all. you don't need some crazy random test to trigger this, once this method passes the compile threshold, (e.g. 10k invocations) then boom.
          Hide
          Uwe Schindler added a comment -

          I opened SOLR-2672 and SOLR-2673 for the Solr test failures.

          Show
          Uwe Schindler added a comment - I opened SOLR-2672 and SOLR-2673 for the Solr test failures.
          Hide
          Uwe Schindler added a comment -

          I can confirm, Roberts fixes fix all bugs in Lucene & Modules (I used the "slow" one which is not slow ). Solr tests no longer segfault when they use PorterStemFilter, but the above test failures are real and not hotspot related.

          Show
          Uwe Schindler added a comment - I can confirm, Roberts fixes fix all bugs in Lucene & Modules (I used the "slow" one which is not slow ). Solr tests no longer segfault when they use PorterStemFilter, but the above test failures are real and not hotspot related.
          Hide
          Uwe Schindler added a comment -

          wait, how do you know? Do all Solr tests pass with -Xint?

          Solr tests also do not pass with -Xint. It seems to be a concurrency bug in Solr's caching. With caching disabled (in SolrIndexSearcher), tests pass except those which directly check cache contents. This affects TestFiltering, RequiredFieldsTest and more tests (fail randomly depending on load).

          Another test randomly fails without reason: TestEchoParams (this test looks like chinese to me, I dont understand any single line and what is tested at all).

          Show
          Uwe Schindler added a comment - wait, how do you know? Do all Solr tests pass with -Xint? Solr tests also do not pass with -Xint. It seems to be a concurrency bug in Solr's caching. With caching disabled (in SolrIndexSearcher), tests pass except those which directly check cache contents. This affects TestFiltering, RequiredFieldsTest and more tests (fail randomly depending on load). Another test randomly fails without reason: TestEchoParams (this test looks like chinese to me, I dont understand any single line and what is tested at all).
          Hide
          Robert Muir added a comment -

          maybe my bug is a duplicate of this one: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051

          Show
          Robert Muir added a comment - maybe my bug is a duplicate of this one: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051
          Hide
          Robert Muir added a comment -

          wait, how do you know? Do all Solr tests pass with -Xint?

          Maybe there is some other issue affecting Solr, perhaps something XML related.
          Please open up a separate JIRA issue for that: I don't want to confuse that stuff with this one.

          Show
          Robert Muir added a comment - wait, how do you know? Do all Solr tests pass with -Xint? Maybe there is some other issue affecting Solr, perhaps something XML related. Please open up a separate JIRA issue for that: I don't want to confuse that stuff with this one.
          Hide
          Uwe Schindler added a comment -

          When applying your patch, Lucene core and modules build correctly, but solr fails on random tests with unreproducible error messages.

          It seems that hotspot is totally broken in Java 7.

          Show
          Uwe Schindler added a comment - When applying your patch, Lucene core and modules build correctly, but solr fails on random tests with unreproducible error messages. It seems that hotspot is totally broken in Java 7.
          Hide
          Uwe Schindler added a comment -

          I like the first patch more, as the code is much easier to understand. I assume its not slower.

          Show
          Uwe Schindler added a comment - I like the first patch more, as the code is much easier to understand. I assume its not slower.
          Hide
          Robert Muir added a comment -

          here's a better workaround, adds a redundant 'ch == 0' check.

          Show
          Robert Muir added a comment - here's a better workaround, adds a redundant 'ch == 0' check.
          Hide
          Robert Muir added a comment -

          TestPorterStemmer -> TestPorterStemFilter

          Show
          Robert Muir added a comment - TestPorterStemmer -> TestPorterStemFilter
          Hide
          Mark Miller added a comment -

          Yuck. Welcome to 1980

          Show
          Mark Miller added a comment - Yuck. Welcome to 1980
          Hide
          Robert Muir added a comment -
          You can monitor this bug on the Java Bug Database at
          http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134.
          
          It may take a day or two before your bug shows up in this external database. 
          
          Show
          Robert Muir added a comment - You can monitor this bug on the Java Bug Database at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134. It may take a day or two before your bug shows up in this external database.
          Hide
          Mark Miller added a comment -
          Show
          Mark Miller added a comment - http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134 This link does not work for me.
          Show
          Robert Muir added a comment - http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134
          Hide
          Robert Muir added a comment -

          I opened a bug at sun, here are my 'steps to reproduce':

          curl http://tartarus.org/~martin/PorterStemmer/java.txt > Stemmer.java
          javac Stemmer.java
          java Stemmer /usr/share/dict/words
          
          Show
          Robert Muir added a comment - I opened a bug at sun, here are my 'steps to reproduce': curl http://tartarus.org/~martin/PorterStemmer/java.txt > Stemmer.java javac Stemmer.java java Stemmer /usr/share/dict/words
          Hide
          Robert Muir added a comment -

          yeah i think its a bad bug, obviously even casually using this stemmer will cause it to crash (this is no crazy random test but just stemming a file of a few thousand english words)

          how do we vote -1 to release Java7

          Show
          Robert Muir added a comment - yeah i think its a bad bug, obviously even casually using this stemmer will cause it to crash (this is no crazy random test but just stemming a file of a few thousand english words) how do we vote -1 to release Java7
          Hide
          Uwe Schindler added a comment -

          Solr tests never pass with Java7 because of that

          Show
          Uwe Schindler added a comment - Solr tests never pass with Java7 because of that
          Hide
          Robert Muir added a comment -

          heres a simple workaround, probably slows the filter down some.

          Show
          Robert Muir added a comment - heres a simple workaround, probably slows the filter down some.
          Hide
          Robert Muir added a comment -

          i traced this down to the step4 method... maybe we can code it differently and dodge the bug.
          e.g. this passes:
          ant test -Dtestcase=TestPorterStemFilter -Dtests.iter=100 -Dargs="-XX:CompileCommand=exclude,org/apache/lucene/analysis/en/PorterStemmer,step4"

          Show
          Robert Muir added a comment - i traced this down to the step4 method... maybe we can code it differently and dodge the bug. e.g. this passes: ant test -Dtestcase=TestPorterStemFilter -Dtests.iter=100 -Dargs="-XX:CompileCommand=exclude,org/apache/lucene/analysis/en/PorterStemmer,step4"
          Hide
          Uwe Schindler added a comment -
          [junit] Testsuite: org.apache.lucene.analysis.en.TestPorterStemFilter
          [junit] #
          [junit] # A fatal error has been detected by the Java Runtime Environment:
          [junit] #
          [junit] #  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000002348faf, pid=11080, tid=10288
          [junit] #
          [junit] # JRE version: 7.0-b147
          [junit] # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode windows-amd64 compressed oops)
          [junit] # Problematic frame:
          [junit] # J  org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z
          [junit] #
          [junit] # Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
          [junit] #
          [junit] # An error report file with more information is saved as:
          [junit] # C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr3\modules\analysis\build\common\test\1\hs_err_pid11080.log
          [junit] #
          [junit] # If you would like to submit a bug report, please visit:
          [junit] #   http://bugreport.sun.com/bugreport/crash.jsp
          [junit] #
          [junit] Test org.apache.lucene.analysis.en.TestPorterStemFilter FAILED (crashed)
          

          On each run, the problematic frame is different, sometimes incrementToken, sometimes BaseTokenStreamTestcase.assertTokenStreamContents, sometimes PorterStemmer.stem().

          Somehow hotspot corrupts itsself.

          Show
          Uwe Schindler added a comment - [junit] Testsuite: org.apache.lucene.analysis.en.TestPorterStemFilter [junit] # [junit] # A fatal error has been detected by the Java Runtime Environment: [junit] # [junit] # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x0000000002348faf, pid=11080, tid=10288 [junit] # [junit] # JRE version: 7.0-b147 [junit] # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode windows-amd64 compressed oops) [junit] # Problematic frame: [junit] # J org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z [junit] # [junit] # Failed to write core dump. Minidumps are not enabled by default on client versions of Windows [junit] # [junit] # An error report file with more information is saved as: [junit] # C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr3\modules\analysis\build\common\test\1\hs_err_pid11080.log [junit] # [junit] # If you would like to submit a bug report, please visit: [junit] # http://bugreport.sun.com/bugreport/crash.jsp [junit] # [junit] Test org.apache.lucene.analysis.en.TestPorterStemFilter FAILED (crashed) On each run, the problematic frame is different, sometimes incrementToken, sometimes BaseTokenStreamTestcase.assertTokenStreamContents, sometimes PorterStemmer.stem(). Somehow hotspot corrupts itsself.
          Hide
          Uwe Schindler added a comment -

          Passes with -Xint (but takes very long)

          Show
          Uwe Schindler added a comment - Passes with -Xint (but takes very long)

            People

            • Assignee:
              Robert Muir
              Reporter:
              Robert Muir
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development