Lucene - Core
  1. Lucene - Core
  2. LUCENE-3235

TestDoubleBarrelLRUCache hangs under Java 1.5, 3.x and trunk, likely JVM bug

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 3.0, 3.1, 3.2, 3.3, 3.4
    • Fix Version/s: 4.8
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Not sure what's going on yet... but under Java 1.6 it seems not to hang bug under Java 1.5 hangs fairly easily, on Linux. Java is 1.5.0_22.

      I suspect this is relevant: http://stackoverflow.com/questions/3292577/is-it-possible-for-concurrenthashmap-to-deadlock which refers to this JVM bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6865591 which then refers to this one http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6822370

      It looks like that last bug was fixed in Java 1.6 but not 1.5.

      1. LUCENE-3235.patch
        3 kB
        Uwe Schindler
      2. LUCENE-3235.patch
        2 kB
        Uwe Schindler
      3. LUCENE-3235.patch
        3 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Hoss Man added a comment -

          Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19.

          Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited

          Show
          Hoss Man added a comment - Bulk changing fixVersion 3.6 to 4.0 for any open issues that are unassigned and have not been updated since March 19. Email spam suppressed for this bulk edit; search for hoss20120323nofix36 to identify all issues edited
          Hide
          Robert Muir added a comment -

          we can also offer an option... fasterButMoreHangs...

          Show
          Robert Muir added a comment - we can also offer an option... fasterButMoreHangs...
          Hide
          Michael McCandless added a comment -

          If the whole point is to prevent double-seeks for queries doing IDF then getting termdocs,
          we can just put a tiny LRU cache in the already-existing threadlocal...

          In fact this is what we used to do (LinkedHashMap), and then in LUCENE-2075 we moved to a cache shareable across threads.

          But I agree – risk of hangs is not worth it. Let's just move back to thread-private cache, but let's make it tiny (8? 16? 13?) in size?

          Show
          Michael McCandless added a comment - If the whole point is to prevent double-seeks for queries doing IDF then getting termdocs, we can just put a tiny LRU cache in the already-existing threadlocal... In fact this is what we used to do (LinkedHashMap), and then in LUCENE-2075 we moved to a cache shareable across threads. But I agree – risk of hangs is not worth it. Let's just move back to thread-private cache, but let's make it tiny (8? 16? 13?) in size?
          Hide
          Uwe Schindler added a comment -

          Robert, I agree, a simple LRUCache based on LinkedHashMap with

          @Override
          protected boolean removeEldestEntry(Map.Entry<K,V> eldest) {
            return size() > MAX_ENTRIES;
          }
          

          would be fine. Because of this bugs and heavy use in Solr of other concurrent classes, we should maybe add a warning to the startup based on my findings before.

          Show
          Uwe Schindler added a comment - Robert, I agree, a simple LRUCache based on LinkedHashMap with @Override protected boolean removeEldestEntry(Map.Entry<K,V> eldest) { return size() > MAX_ENTRIES; } would be fine. Because of this bugs and heavy use in Solr of other concurrent classes, we should maybe add a warning to the startup based on my findings before.
          Hide
          Robert Muir added a comment -

          actually i think even a tiny queue would work.

          this would prevent the double-seeks. I dont think there is a point to worrying about anything else.

          the LRU-ness seems stupid: for 'common' terms that appear over and over like stopwords, those are gonna be slow anyway.

          Show
          Robert Muir added a comment - actually i think even a tiny queue would work. this would prevent the double-seeks. I dont think there is a point to worrying about anything else. the LRU-ness seems stupid: for 'common' terms that appear over and over like stopwords, those are gonna be slow anyway.
          Hide
          Robert Muir added a comment -

          Maybe we should remove this global cache completely?

          If the whole point is to prevent double-seeks for queries doing IDF then getting termdocs,
          we can just put a tiny LRU cache in the already-existing threadlocal...

          Show
          Robert Muir added a comment - Maybe we should remove this global cache completely? If the whole point is to prevent double-seeks for queries doing IDF then getting termdocs, we can just put a tiny LRU cache in the already-existing threadlocal...
          Hide
          Uwe Schindler added a comment - - edited

          This ois the way to get the runtime args: http://download.oracle.com/javase/1.5.0/docs/api/java/lang/management/RuntimeMXBean.html#getInputArguments():

          RuntimeMXBean m = ManagementFactory.getRuntimeMXBean();
          List<String> vmargs = m.getInputArguments();
          

          This prints all args for me:

          [junit] vmargs=[-Xmx512M, -Dtests.verbose=false, -Dtests.infostream=false, -Dtests.lockdir=C:\Users\Uwe, Schindler\Projects\lucene\trunk-lusolr1\lucene\build, -Dtests.codec=random, -Dtests.postingsformat=random, -Dtests.locale=random, -Dtests.timezone=random, -Dtests.directory=random, -Dtests.linedocsfile=europarl.lines.txt.gz, -Dtests.iter=1, -Dtests.iter.min=1, -Dtests.seed=random, -Dtests.luceneMatchVersion=4.0, -Dtests.cleanthreads=perMethod, -Djava.util.logging.config.file=/dev/null, -Dtests.nightly=false, -Dtests.asserts.gracious=false, -Dtests.multiplier=1, -DtempDir=C:\Users\Uwe, Schindler\Projects\lucene\trunk-lusolr1\lucene\build\test\1, -Dlucene.version=4.0-SNAPSHOT, -Dtestmethod=, -Djetty.testMode=1, -Djetty.insecurerandom=1, -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory, -ea:org.apache.lucene..., -ea:org.apache.solr...]
          
          Show
          Uwe Schindler added a comment - - edited This ois the way to get the runtime args: http://download.oracle.com/javase/1.5.0/docs/api/java/lang/management/RuntimeMXBean.html#getInputArguments() : RuntimeMXBean m = ManagementFactory.getRuntimeMXBean(); List< String > vmargs = m.getInputArguments(); This prints all args for me: [junit] vmargs=[-Xmx512M, -Dtests.verbose=false, -Dtests.infostream=false, -Dtests.lockdir=C:\Users\Uwe, Schindler\Projects\lucene\trunk-lusolr1\lucene\build, -Dtests.codec=random, -Dtests.postingsformat=random, -Dtests.locale=random, -Dtests.timezone=random, -Dtests.directory=random, -Dtests.linedocsfile=europarl.lines.txt.gz, -Dtests.iter=1, -Dtests.iter.min=1, -Dtests.seed=random, -Dtests.luceneMatchVersion=4.0, -Dtests.cleanthreads=perMethod, -Djava.util.logging.config.file=/dev/null, -Dtests.nightly=false, -Dtests.asserts.gracious=false, -Dtests.multiplier=1, -DtempDir=C:\Users\Uwe, Schindler\Projects\lucene\trunk-lusolr1\lucene\build\test\1, -Dlucene.version=4.0-SNAPSHOT, -Dtestmethod=, -Djetty.testMode=1, -Djetty.insecurerandom=1, -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory, -ea:org.apache.lucene..., -ea:org.apache.solr...]
          Hide
          Uwe Schindler added a comment -

          Yes, also 1.6.0_17 is affected. As always, 1.6.0_18 is the last and only good JVM

          My problem with the patch is that it only affects very few users, most users will have a working environment also with broken JVMs. The fix in the patch is very heavy, as, if we apply it correctly, will also slowdown <1.6.0_18.

          As I said before, we should at least instruct Solr to print a WARN in the log if a JVM < 1.6.0_18 is started and the JVM parameter -XX:+UseMembar is missing. In Lucene we have no way to tell this the user as we have no logging framework, alternatively we could throw an Error is one of the central classes in Lucene is loaded by classloader and the JVM parameter is not given (static initializer e.g. in Constants.java). The same way we could tell the user: Dont use Java 7 GA.

          As far as I know, the JVM command can be checked with a System-property and a simple regex should help.

          Show
          Uwe Schindler added a comment - Yes, also 1.6.0_17 is affected. As always, 1.6.0_18 is the last and only good JVM My problem with the patch is that it only affects very few users, most users will have a working environment also with broken JVMs. The fix in the patch is very heavy, as, if we apply it correctly, will also slowdown <1.6.0_18. As I said before, we should at least instruct Solr to print a WARN in the log if a JVM < 1.6.0_18 is started and the JVM parameter -XX:+UseMembar is missing. In Lucene we have no way to tell this the user as we have no logging framework, alternatively we could throw an Error is one of the central classes in Lucene is loaded by classloader and the JVM parameter is not given (static initializer e.g. in Constants.java). The same way we could tell the user: Dont use Java 7 GA. As far as I know, the JVM command can be checked with a System-property and a simple regex should help.
          Hide
          Robert Muir added a comment -

          How are we slowing down other VMs...? We scope the workaround?

          I agree, according to the bug report it seems all 1.5's are affected and some 1.6's?
          Doesn't seem like solaris is related either, especially since Mike hit it on linux

          So the current patch is actually way under-scoped.

          Sure, some 1.6's are affected, and if we want it to be even better,
          we should likely improve Constants a little bit to make the minor
          version more easily accessible, from the bug report it seems we
          should at least consider doing something for < 1.6.0u21 ? And we should
          remove the 'Solaris' check, but keep it little-endian because the bug
          report mentions its way more likely to happen on those cpus.

          Show
          Robert Muir added a comment - How are we slowing down other VMs...? We scope the workaround? I agree, according to the bug report it seems all 1.5's are affected and some 1.6's? Doesn't seem like solaris is related either, especially since Mike hit it on linux So the current patch is actually way under-scoped. Sure, some 1.6's are affected, and if we want it to be even better, we should likely improve Constants a little bit to make the minor version more easily accessible, from the bug report it seems we should at least consider doing something for < 1.6.0u21 ? And we should remove the 'Solaris' check, but keep it little-endian because the bug report mentions its way more likely to happen on those cpus.
          Hide
          Michael McCandless added a comment -

          In my opinion we should not change our code to work around that issue.

          In general, I think we should change our code to work around awful JVM
          bugs, as long as 1) it's not so much effort for us to to do so (and as
          always a volunteer steps up to the task), and 2) the change has
          negligible cost to "lucky" users (who use a JVM / the right flags that
          would not have hit the JVM bug).

          I think the last patch fits these criteria, since it's a tiny change
          and it scopes the workaround?

          We've done this many times in the past; if the cost to "lucky" users
          is negligible and the benefit to "unlucky" users (unknowingly using
          the affected JVMs) is immense (not hitting horrific bug), I think the
          tradeoff is worthwhile? Otherwise users will conclude Lucene (or
          whatever software is embedding it) is buggy.

          This testcase fails, but we are using concurrent also in ParallelMultiSearcher (die, die, die) and other places (even the indexer was partly upgraded to use ConcurrentLock).

          Right, we use concurrent* elsewhere, but terms dict is the big
          user... very few apps actually use PMS.

          It brings a false security and slows down VMs that work correctly.

          Well, we already have "false security" that Lucene won't hang on any
          JVM... we don't claim this patch will fully work around the bug, but
          at least it should reduce it.

          How are we slowing down other VMs...? We scope the workaround?

          I'm not saying we should go crazy here, making a big patch to avoid
          concurrent* everywhere, but the current patch is minimal, addresses
          the big usage of concurrent* in 3.x, is scoped down well.

          It will avoid hangs for some number unlucky users out there... so why
          not commit it?

          Show
          Michael McCandless added a comment - In my opinion we should not change our code to work around that issue. In general, I think we should change our code to work around awful JVM bugs, as long as 1) it's not so much effort for us to to do so (and as always a volunteer steps up to the task), and 2) the change has negligible cost to "lucky" users (who use a JVM / the right flags that would not have hit the JVM bug). I think the last patch fits these criteria, since it's a tiny change and it scopes the workaround? We've done this many times in the past; if the cost to "lucky" users is negligible and the benefit to "unlucky" users (unknowingly using the affected JVMs) is immense (not hitting horrific bug), I think the tradeoff is worthwhile? Otherwise users will conclude Lucene (or whatever software is embedding it) is buggy. This testcase fails, but we are using concurrent also in ParallelMultiSearcher (die, die, die) and other places (even the indexer was partly upgraded to use ConcurrentLock). Right, we use concurrent* elsewhere, but terms dict is the big user... very few apps actually use PMS. It brings a false security and slows down VMs that work correctly. Well, we already have "false security" that Lucene won't hang on any JVM... we don't claim this patch will fully work around the bug, but at least it should reduce it. How are we slowing down other VMs...? We scope the workaround? I'm not saying we should go crazy here, making a big patch to avoid concurrent* everywhere, but the current patch is minimal, addresses the big usage of concurrent* in 3.x, is scoped down well. It will avoid hangs for some number unlucky users out there... so why not commit it?
          Hide
          Uwe Schindler added a comment -

          This testcase fails, but we are using concurrent also in ParallelMultiSearcher (die, die, die) and other places (even the indexer was partly upgraded to use ConcurrentLock). In my opinion we should not change our code to work around that issue. Just because one test case hangs its not guaranteed that other uses will work correctly. It brings a false security and slows down VMs that work correctly. And it only affects very modern processors.

          If we would have a logging framework in Lucene (maybe Solr could do this): It could parse the args of Java (from system property) and look for -XX:+UseMembar, if its Java 1.5 it should print a warning to Solr/Lucene log file.

          Show
          Uwe Schindler added a comment - This testcase fails, but we are using concurrent also in ParallelMultiSearcher (die, die, die) and other places (even the indexer was partly upgraded to use ConcurrentLock). In my opinion we should not change our code to work around that issue. Just because one test case hangs its not guaranteed that other uses will work correctly. It brings a false security and slows down VMs that work correctly. And it only affects very modern processors. If we would have a logging framework in Lucene (maybe Solr could do this): It could parse the args of Java (from system property) and look for -XX:+UseMembar, if its Java 1.5 it should print a warning to Solr/Lucene log file.
          Hide
          Mark Miller added a comment -

          I still think the workaround (scoped down to 1.5, Sun JVM, little endian arch) makes sense?

          +1 Doesn't hurt other JVMs, improves things on the Sun JVM (something that hangs a lot and then does not hang is a big improvement in my book), and putting a workaround command to use in the wiki just seems a whole lot less user friendly to me. It doesn't mean we have to try and address every use of java.util.concurrent to work around this specific issue, does it?

          Show
          Mark Miller added a comment - I still think the workaround (scoped down to 1.5, Sun JVM, little endian arch) makes sense? +1 Doesn't hurt other JVMs, improves things on the Sun JVM (something that hangs a lot and then does not hang is a big improvement in my book), and putting a workaround command to use in the wiki just seems a whole lot less user friendly to me. It doesn't mean we have to try and address every use of java.util.concurrent to work around this specific issue, does it?
          Hide
          Michael McCandless added a comment -

          That's a nice blog post! What a scary platform-specific JVM bug...

          I still hit hit this hang reasonably often when running 3.x tests. It's always the DBLRU cache, so far anyway.

          Because this is our most intense use of a CHM... I still think the workaround (scoped down to 1.5, Sun JVM, little endian arch) makes sense? I agree it won't fully work around the JVM bug, since in theory other uses of java.util.concurrent.* could hit it, but it can prevent the most common occurrence? The patch seems minimal and worth it... a hang is a truly awful.

          Show
          Michael McCandless added a comment - That's a nice blog post! What a scary platform-specific JVM bug... I still hit hit this hang reasonably often when running 3.x tests. It's always the DBLRU cache, so far anyway. Because this is our most intense use of a CHM... I still think the workaround (scoped down to 1.5, Sun JVM, little endian arch) makes sense? I agree it won't fully work around the JVM bug, since in theory other uses of java.util.concurrent.* could hit it, but it can prevent the most common occurrence? The patch seems minimal and worth it... a hang is a truly awful.
          Hide
          Simon Willnauer added a comment -

          I updated the wiki...

          Show
          Simon Willnauer added a comment - I updated the wiki...
          Hide
          Simon Willnauer added a comment -

          Thanks Simon, I think we should list this bug and its workaround in the wiki page and close this report.

          +1 this is not our problem. if we go and fix all java.util.concurrent uses in solr & lucene we gonna end up in a big mess. According to the oracle blog this is also in 1.6 jvms and you will be vulnderable if you use any CHM like classes in your own code...

          Show
          Simon Willnauer added a comment - Thanks Simon, I think we should list this bug and its workaround in the wiki page and close this report. +1 this is not our problem. if we go and fix all java.util.concurrent uses in solr & lucene we gonna end up in a big mess. According to the oracle blog this is also in 1.6 jvms and you will be vulnderable if you use any CHM like classes in your own code...
          Hide
          Uwe Schindler added a comment -

          That blog is as cool as the generics policeman ones...

          Thanks Simon, I think we should list this bug and its workaround in the wiki page and close this report.

          Mike can you try -XX:+UseMembar ?

          Uwe

          Show
          Uwe Schindler added a comment - That blog is as cool as the generics policeman ones... Thanks Simon, I think we should list this bug and its workaround in the wiki page and close this report. Mike can you try -XX:+UseMembar ? Uwe
          Hide
          Simon Willnauer added a comment -

          here is a very detailed writeup for this:
          http://blogs.oracle.com/dave/entry/a_race_in_locksupport_park

          some interesting facts:

          • The problem would only manifest when we were using the -UseMembar optimization that lets us remove fences from certain hot thread state transitions paths that need to coordinate safepoints between mutator threads and the JVM. This feature is enabled by default, but we can turn it off with the -XX:+UseMembar switch, which causes the JVM to emit normal fence instructions in the state transitions paths.
          • The bug is a "day-one" bug and present in all versions of HotSpot.
          • Parker::park() and unpark() reside in os_linux.cpp, os_solaris.cpp and os_windows.cpp for Linux, Solaris and Windows, respectively.
          • The built-in synchronized implementation uses a different park mechanism (PlatformPark: whereas the java.util.concurrent infrastructure uses Parker::. Only Parker:: is vulnerable.
          • The bug will not manifest on uniprocessors or environments where threads are otherwise constrained to just a single processor.

          I think the only reasonable fix for this is to recommend people to use -XX:+UseMembar if they are running on a vulnerable JVM

          simon

          Show
          Simon Willnauer added a comment - here is a very detailed writeup for this: http://blogs.oracle.com/dave/entry/a_race_in_locksupport_park some interesting facts: The problem would only manifest when we were using the -UseMembar optimization that lets us remove fences from certain hot thread state transitions paths that need to coordinate safepoints between mutator threads and the JVM. This feature is enabled by default, but we can turn it off with the -XX:+UseMembar switch, which causes the JVM to emit normal fence instructions in the state transitions paths. The bug is a "day-one" bug and present in all versions of HotSpot. Parker::park() and unpark() reside in os_linux.cpp, os_solaris.cpp and os_windows.cpp for Linux, Solaris and Windows, respectively. The built-in synchronized implementation uses a different park mechanism (PlatformPark: whereas the java.util.concurrent infrastructure uses Parker::. Only Parker:: is vulnerable. The bug will not manifest on uniprocessors or environments where threads are otherwise constrained to just a single processor. I think the only reasonable fix for this is to recommend people to use -XX:+UseMembar if they are running on a vulnerable JVM simon
          Hide
          Robert Muir added a comment -

          I modified the patch, the thing is that Mike was seeing this on Linux I think too...

          Show
          Robert Muir added a comment - I modified the patch, the thing is that Mike was seeing this on Linux I think too...
          Hide
          Robert Muir added a comment -

          and intel cpu

          Show
          Robert Muir added a comment - and intel cpu
          Hide
          Uwe Schindler added a comment -

          ...and Solaris? The JVM BUG seems to only affect Solaris (according to the sun reports).

          Show
          Uwe Schindler added a comment - ...and Solaris? The JVM BUG seems to only affect Solaris (according to the sun reports).
          Hide
          Robert Muir added a comment -

          Hoss you are right: we should also check Constants.SUN ?

          otherwise lets not do the hack...

          But i'm for the change because there is nothing slower than a hang/deadlock...

          Show
          Robert Muir added a comment - Hoss you are right: we should also check Constants.SUN ? otherwise lets not do the hack... But i'm for the change because there is nothing slower than a hang/deadlock...
          Hide
          Uwe Schindler added a comment -

          We are using ConcurrentHashMap also at other places, should we replace all of them or where is the bug that this happens only here?

          It also appears to happen on Mike's machine, so maybe its hardware-related (Solaris?) as the Sun bugreport seems to tell us.

          I am also +0 to apply the patch. I just showed one possibility how to fix this.

          Show
          Uwe Schindler added a comment - We are using ConcurrentHashMap also at other places, should we replace all of them or where is the bug that this happens only here? It also appears to happen on Mike's machine, so maybe its hardware-related (Solaris?) as the Sun bugreport seems to tell us. I am also +0 to apply the patch. I just showed one possibility how to fix this.
          Hide
          Hoss Man added a comment -

          +0

          if anyone else suggested that we should add a "slow hack" to work around a Sun JRE bug i would argue that we were being unfair to people using other JRE (ie: does IBM's JRE have this bug? – do IBM java 1.5 users deserve slower performance because Sun's JRE has a bug?) but since rmuir is the biggest proponent I know of not assuming everyone on the planet uses Sun JREs, and he's signed off on this, I'll defer.

          Show
          Hoss Man added a comment - +0 if anyone else suggested that we should add a "slow hack" to work around a Sun JRE bug i would argue that we were being unfair to people using other JRE (ie: does IBM's JRE have this bug? – do IBM java 1.5 users deserve slower performance because Sun's JRE has a bug?) but since rmuir is the biggest proponent I know of not assuming everyone on the planet uses Sun JREs, and he's signed off on this, I'll defer.
          Hide
          Michael McCandless added a comment -

          +1 for the safe-but-slow Java 5 only workaround....

          Show
          Michael McCandless added a comment - +1 for the safe-but-slow Java 5 only workaround....
          Hide
          Uwe Schindler added a comment -

          I wait until tomorrow before I commit this "safe-but-slow" fix.

          Show
          Uwe Schindler added a comment - I wait until tomorrow before I commit this "safe-but-slow" fix.
          Hide
          Uwe Schindler added a comment -

          Updated patch after LUCENE-3574 was committed. I also added a System.out.println to the test (VERBOSE only).

          Show
          Uwe Schindler added a comment - Updated patch after LUCENE-3574 was committed. I also added a System.out.println to the test (VERBOSE only).
          Hide
          Simon Willnauer added a comment -

          An easy fix would be to use Collections.synchronizedMap(new HashMap()) in the ctor to initializer cache1 and cache2 (if Java 5 is detected)? If people are using Java 5 they get not-the best-performance.

          I like that too...

          Show
          Simon Willnauer added a comment - An easy fix would be to use Collections.synchronizedMap(new HashMap()) in the ctor to initializer cache1 and cache2 (if Java 5 is detected)? If people are using Java 5 they get not-the best-performance. I like that too...
          Hide
          Uwe Schindler added a comment -

          Patch.

          We should forward port the deprecation/removal of useless Constants.

          Show
          Uwe Schindler added a comment - Patch. We should forward port the deprecation/removal of useless Constants.
          Hide
          Uwe Schindler added a comment -

          I am currently preparing a patch.

          Show
          Uwe Schindler added a comment - I am currently preparing a patch.
          Hide
          Robert Muir added a comment -

          I like Uwe's idea: not-the-best-performance is far preferable to a hang/deadlock!!!!!

          Show
          Robert Muir added a comment - I like Uwe's idea: not-the-best-performance is far preferable to a hang/deadlock!!!!!
          Hide
          Uwe Schindler added a comment -

          An easy fix would be to use Collections.synchronizedMap(new HashMap()) in the ctor to initializer cache1 and cache2 (if Java 5 is detected)? If people are using Java 5 they get not-the best-performance.

          Show
          Uwe Schindler added a comment - An easy fix would be to use Collections.synchronizedMap(new HashMap()) in the ctor to initializer cache1 and cache2 (if Java 5 is detected)? If people are using Java 5 they get not-the best-performance.
          Hide
          Simon Willnauer added a comment -

          well then we should fix it - I will mark it as 3.5

          Show
          Simon Willnauer added a comment - well then we should fix it - I will mark it as 3.5
          Hide
          Uwe Schindler added a comment -

          I agree with Robert. This issue is still existent in 3.x as we officially support Java 5.

          Show
          Uwe Schindler added a comment - I agree with Robert. This issue is still existent in 3.x as we officially support Java 5.
          Hide
          Robert Muir added a comment -

          wait, this statement makes no sense.

          if 1.5 is no longer supported, then 1.5 should no longer be supported, and we should be
          free to use 1.6 code everywhere.

          Show
          Robert Muir added a comment - wait, this statement makes no sense. if 1.5 is no longer supported, then 1.5 should no longer be supported, and we should be free to use 1.6 code everywhere.
          Hide
          Simon Willnauer added a comment -

          we moved to 1.6 on trunk seems we can't do much about it on 3.x - folks should run their stuff on 1.6 jvms or newer

          Show
          Simon Willnauer added a comment - we moved to 1.6 on trunk seems we can't do much about it on 3.x - folks should run their stuff on 1.6 jvms or newer
          Hide
          Mark Miller added a comment -

          +1 to drop 1.5...

          +1.

          Show
          Mark Miller added a comment - +1 to drop 1.5... +1.
          Hide
          Dawid Weiss added a comment -

          I'm same as Robert: +1 to drop 1.5...

          Show
          Dawid Weiss added a comment - I'm same as Robert: +1 to drop 1.5...
          Hide
          Michael McCandless added a comment -

          Indeed java -client -version shows it's still using server VM – you're right!

          Show
          Michael McCandless added a comment - Indeed java -client -version shows it's still using server VM – you're right!
          Hide
          Michael McCandless added a comment -

          Yes the stack looks just like the stack overflow link I posted – several threads stuck in sun.misc.Unsafe.park

          java -Xint definitely does not hang... ran for like 4200 iterations.

          Show
          Michael McCandless added a comment - Yes the stack looks just like the stack overflow link I posted – several threads stuck in sun.misc.Unsafe.park java -Xint definitely does not hang... ran for like 4200 iterations.
          Hide
          Dawid Weiss added a comment -

          I don't think you can force -client if it's a 64 bit release and you have tons of memory, can you? You can check by running java -client -version – this is what it tells me, for example:

          dweiss@dweiss-linux:~/work/lucene/lucene-trunk$ java -client -version
          java version "1.6.0_16"
          Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
          Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
          

          Can you do a remote stack of all the VM (or run it from the console and send it a signal to dump all threads)?

          Show
          Dawid Weiss added a comment - I don't think you can force -client if it's a 64 bit release and you have tons of memory, can you? You can check by running java -client -version – this is what it tells me, for example: dweiss@dweiss-linux:~/work/lucene/lucene-trunk$ java -client -version java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) Can you do a remote stack of all the VM (or run it from the console and send it a signal to dump all threads)?
          Hide
          Michael McCandless added a comment -

          VERY interesting! Is anyone able to repro this hang besides me...?

          Show
          Michael McCandless added a comment - VERY interesting! Is anyone able to repro this hang besides me...?
          Hide
          Robert Muir added a comment -

          Mike, i installed 1.5.0_22 (amd64) on my linux machine, and i can't reproduce there either (i ran like 500 iterations).

          Maybe my hardware isn't concurrent enough? or maybe you should un-overclock?

          Show
          Robert Muir added a comment - Mike, i installed 1.5.0_22 (amd64) on my linux machine, and i can't reproduce there either (i ran like 500 iterations). Maybe my hardware isn't concurrent enough? or maybe you should un-overclock?
          Hide
          Michael McCandless added a comment -

          Still hangs if I run -client; but it looks like -Xint prevents the hang (235 iterations so far on beast).

          3.2 also hangs.

          Show
          Michael McCandless added a comment - Still hangs if I run -client; but it looks like -Xint prevents the hang (235 iterations so far on beast). 3.2 also hangs.
          Hide
          Robert Muir added a comment -

          i ran the test with the same version as mike (1.5.0_22) in two ways on windows:

          • -Dtests.iter=100
          • in a loop from a script, 100 times with its own ant run.

          i can't reproduce it on windows.

          in my eyes, there isn't even an argument about whether or not we should support java5: its not possible, if bugs are not getting fixed.

          Show
          Robert Muir added a comment - i ran the test with the same version as mike (1.5.0_22) in two ways on windows: -Dtests.iter=100 in a loop from a script, 100 times with its own ant run. i can't reproduce it on windows. in my eyes, there isn't even an argument about whether or not we should support java5: its not possible, if bugs are not getting fixed.
          Hide
          Uwe Schindler added a comment -

          LOL, no comment.

          Show
          Uwe Schindler added a comment - LOL, no comment.
          Hide
          Robert Muir added a comment -

          +1 to drop java 5

          Show
          Robert Muir added a comment - +1 to drop java 5

            People

            • Assignee:
              Unassigned
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development