In my opinion we should not change our code to work around that issue.
In general, I think we should change our code to work around awful JVM
bugs, as long as 1) it's not so much effort for us to to do so (and as
always a volunteer steps up to the task), and 2) the change has
negligible cost to "lucky" users (who use a JVM / the right flags that
would not have hit the JVM bug).
I think the last patch fits these criteria, since it's a tiny change
and it scopes the workaround?
We've done this many times in the past; if the cost to "lucky" users
is negligible and the benefit to "unlucky" users (unknowingly using
the affected JVMs) is immense (not hitting horrific bug), I think the
tradeoff is worthwhile? Otherwise users will conclude Lucene (or
whatever software is embedding it) is buggy.
This testcase fails, but we are using concurrent also in ParallelMultiSearcher (die, die, die) and other places (even the indexer was partly upgraded to use ConcurrentLock).
Right, we use concurrent* elsewhere, but terms dict is the big
user... very few apps actually use PMS.
It brings a false security and slows down VMs that work correctly.
Well, we already have "false security" that Lucene won't hang on any
JVM... we don't claim this patch will fully work around the bug, but
at least it should reduce it.
How are we slowing down other VMs...? We scope the workaround?
I'm not saying we should go crazy here, making a big patch to avoid
concurrent* everywhere, but the current patch is minimal, addresses
the big usage of concurrent* in 3.x, is scoped down well.
It will avoid hangs for some number unlucky users out there... so why
not commit it?