Lucene - Core
  1. Lucene - Core
  2. LUCENE-5573

Deadlock during class loading/ initialization

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8, Trunk
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      It's always worth looking into those randomized failures.

      http://builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/81259/console

      Log quote:

      [junit4]   2> ==== jstack at approximately timeout time ====
         [junit4]   2> "Lucene Merge Thread #0" ID=25 RUNNABLE
         [junit4]   2> 	at org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer.getSortedSet(Lucene45DocValuesProducer.java:541)
         [junit4]   2> 	at org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.getSortedSet(PerFieldDocValuesFormat.java:285)
         [junit4]   2> 	at org.apache.lucene.index.SegmentReader.getSortedSetDocValues(SegmentReader.java:500)
         [junit4]   2> 	at org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:204)
         [junit4]   2> 	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
         [junit4]   2> 	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4068)
         [junit4]   2> 	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3664)
         [junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
         [junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
         [junit4]   2> 
         [junit4]   2> "TEST-TestLucene45DocValuesFormat.testSortedSetVariableLengthVsUninvertedField-seed#[7362FE36DE729D42]" ID=23 RUNNABLE
         [junit4]   2> 	at org.apache.lucene.index.SortedSetDocValues.<clinit>(SortedSetDocValues.java:72)
         [junit4]   2> 	at org.apache.lucene.index.DocTermOrds.iterator(DocTermOrds.java:767)
         [junit4]   2> 	at org.apache.lucene.search.FieldCacheImpl.getDocTermOrds(FieldCacheImpl.java:1214)
         [junit4]   2> 	at org.apache.lucene.index.BaseDocValuesFormatTestCase.doTestSortedSetVsUninvertedField(BaseDocValuesFormatTestCase.java:2342)
         [junit4]   2> 	at org.apache.lucene.index.BaseDocValuesFormatTestCase.testSortedSetVariableLengthVsUninvertedField(BaseDocValuesFormatTestCase.java:2375)
         [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         [junit4]   2> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
         [junit4]   2> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         [junit4]   2> 	at java.lang.reflect.Method.invoke(Method.java:606)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
         [junit4]   2> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
         [junit4]   2> 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
         [junit4]   2> 	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
         [junit4]   2> 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
      

      Robert looked into it and was wtf'd, quote:

      what happened here...
      

      I looked into it and was wtf'd, quote (from my head):

      wtf?!
      

      I looked deeper at the code and it's a beautiful and classic class loading deadlock. I don't think I've seen an example of this in real life ever, except for this one case.

      Problem description.

      1. Thread A attempts to return a new instance of RandomAccessOrds:

      return new RandomAccessOrds() {
      

      RandomAccessOrds extends SortedSetDocValues and has a final static field which in turn loads RandomAccessOrds (circular reference).

      2. Thread B attempts to create:

      private class Iterator extends SortedSetDocValues {
      

      3. If thread B starts loading SortedSetDocValues, it blocks other threads from doing so. If, at the same time, thread A starts loading RandomAccessOrds then thread A will eventually attempt to initialize SortedSetDocValues and both will wait for each other indefinitely.

      I attach a simpler example that does the same as a POC.

      1. X.java
        1 kB
        Dawid Weiss
      2. Main.java
        1 kB
        Dawid Weiss
      3. LUCENE-5573.patch
        34 kB
        Robert Muir
      4. C.java
        0.9 kB
        Dawid Weiss
      5. A.java
        0.9 kB
        Dawid Weiss

        Activity

        Hide
        Dawid Weiss added a comment -

        A repro of the class loading deadlock for reference.

        Show
        Dawid Weiss added a comment - A repro of the class loading deadlock for reference.
        Hide
        Dawid Weiss added a comment -

        Btw. one solution would be to move static field initialization out of SortedSetDocValues.

        public static final SortedSetDocValues EMPTY = new RandomAccessOrds() {
        

        but I'll leave it for Robert to fix since he spotted this beauty first

        Show
        Dawid Weiss added a comment - Btw. one solution would be to move static field initialization out of SortedSetDocValues. public static final SortedSetDocValues EMPTY = new RandomAccessOrds() { but I'll leave it for Robert to fix since he spotted this beauty first
        Hide
        Michael McCandless added a comment -

        What a doozie I'm glad you guys dug into this weird failure...

        Show
        Michael McCandless added a comment - What a doozie I'm glad you guys dug into this weird failure...
        Hide
        Simon Willnauer added a comment -

        should we mark this 4.7.2?

        Show
        Simon Willnauer added a comment - should we mark this 4.7.2?
        Hide
        Robert Muir added a comment -

        Yes, i dont like the idea of hanging people's computers.

        Show
        Robert Muir added a comment - Yes, i dont like the idea of hanging people's computers.
        Hide
        Robert Muir added a comment -

        oh wait: duh this only affects 4.8. its unreleased code. we are ok.

        Show
        Robert Muir added a comment - oh wait: duh this only affects 4.8. its unreleased code. we are ok.
        Hide
        Robert Muir added a comment -

        Here's my solution: i moved all this helper stuff to a separate class. So the patch adds one public class, but removes 3 others, so overall I think its a win and reduces the API...

        Would be nice to figure out some kind of analysis we could do to detect these things (versus having tests hang if we get lucky).

        Thank you Dawid for digging in...

        Show
        Robert Muir added a comment - Here's my solution: i moved all this helper stuff to a separate class. So the patch adds one public class, but removes 3 others, so overall I think its a win and reduces the API... Would be nice to figure out some kind of analysis we could do to detect these things (versus having tests hang if we get lucky). Thank you Dawid for digging in...
        Hide
        Dawid Weiss added a comment -

        > Would be nice to figure out some kind of analysis we could do to detect these things (versus having tests hang if we get lucky).

        I think it should be doable with static bytecode analysis – you'd need to parse <clinit> methods and extract DAG of class loading dependencies. If there exists any two pairs with reverse order initialization then you have a potential deadlock. Again - this shouldn't be a frequent scenario... but who knows

        Show
        Dawid Weiss added a comment - > Would be nice to figure out some kind of analysis we could do to detect these things (versus having tests hang if we get lucky). I think it should be doable with static bytecode analysis – you'd need to parse <clinit> methods and extract DAG of class loading dependencies. If there exists any two pairs with reverse order initialization then you have a potential deadlock. Again - this shouldn't be a frequent scenario... but who knows
        Hide
        Robert Muir added a comment -

        Classycle can list the possibilities, but it just lists cycles in general, not ones in static init... or i dont know how to coerce it into this.

        Cycle: org.apache.lucene.index.SortedSetDocValues et al. with 3 vertices. Layer: 8
            abstract class org.apache.lucene.index.SortedSetDocValues (2621 bytes) sources: java: Used by 23 classes. Uses 4/2 internal/external classes
            class org.apache.lucene.index.SortedSetDocValuesTermsEnum (3896 bytes) sources: java: Used by 1 classes. Uses 8/4 internal/external classes
            abstract class org.apache.lucene.index.RandomAccessOrds (398 bytes) sources: java: Used by 3 classes. Uses 1/0 internal/external classes
        
        Show
        Robert Muir added a comment - Classycle can list the possibilities, but it just lists cycles in general, not ones in static init... or i dont know how to coerce it into this. Cycle: org.apache.lucene.index.SortedSetDocValues et al. with 3 vertices. Layer: 8 abstract class org.apache.lucene.index.SortedSetDocValues (2621 bytes) sources: java: Used by 23 classes. Uses 4/2 internal/external classes class org.apache.lucene.index.SortedSetDocValuesTermsEnum (3896 bytes) sources: java: Used by 1 classes. Uses 8/4 internal/external classes abstract class org.apache.lucene.index.RandomAccessOrds (398 bytes) sources: java: Used by 3 classes. Uses 1/0 internal/external classes
        Hide
        Dawid Weiss added a comment -

        I think you'd need to write a custom visitor with asmlib, parse class references inside <clinit> (if they exist) and class header pointers (super, implements) to create a graph like this. Doable. But then it also couldn't take into account monitors inside those <clinits> so it could report false positives as well.

        It'd be most interesting to see the deadlock report of this at the JVM level (much as it happens with monitors, currently).

        Show
        Dawid Weiss added a comment - I think you'd need to write a custom visitor with asmlib, parse class references inside <clinit> (if they exist) and class header pointers (super, implements) to create a graph like this. Doable. But then it also couldn't take into account monitors inside those <clinits> so it could report false positives as well. It'd be most interesting to see the deadlock report of this at the JVM level (much as it happens with monitors, currently).
        Hide
        Dawid Weiss added a comment -

        Definitely doable with asmlib... and sounds like a fun side-project.

        public class ClinitCycle {
          public static void main(String[] args) throws Exception {
            ClassReader cr = new ClassReader(X.class.getName());
            ClassNode cn = new ClassNode();
            cr.accept(cn, ClassReader.SKIP_DEBUG);
        
            System.out.println("SUPER: " + cn.superName);
            List methods = cn.methods;
            for (int i = 0; i < methods.size(); ++i) {
                MethodNode method = (MethodNode) methods.get(i);
                System.out.println("Method: " + method.name);
                if (method.instructions.size() > 0) {
                  InsnList instructions = method.instructions;
                  instructions.accept(new MethodVisitor(Opcodes.ASM5) {
                    @Override
                    public void visitTypeInsn(int opcode, String type) {
                      System.out.println(opcode + " -> " + type);
                    }
        
                    @Override
                    public void visitMethodInsn(int opcode, String owner, String name, String desc, boolean itf) {
                      System.out.println(opcode + " -> " + name + " [" + desc + ", " + owner + "]");
                    }
                  });
                }
            }
          }
        }
        
        Show
        Dawid Weiss added a comment - Definitely doable with asmlib... and sounds like a fun side-project. public class ClinitCycle { public static void main( String [] args) throws Exception { ClassReader cr = new ClassReader(X.class.getName()); ClassNode cn = new ClassNode(); cr.accept(cn, ClassReader.SKIP_DEBUG); System .out.println( "SUPER: " + cn.superName); List methods = cn.methods; for ( int i = 0; i < methods.size(); ++i) { MethodNode method = (MethodNode) methods.get(i); System .out.println( "Method: " + method.name); if (method.instructions.size() > 0) { InsnList instructions = method.instructions; instructions.accept( new MethodVisitor(Opcodes.ASM5) { @Override public void visitTypeInsn( int opcode, String type) { System .out.println(opcode + " -> " + type); } @Override public void visitMethodInsn( int opcode, String owner, String name, String desc, boolean itf) { System .out.println(opcode + " -> " + name + " [" + desc + ", " + owner + "]" ); } }); } } } }
        Hide
        ASF subversion and git services added a comment -

        Commit 1586600 from rmuir@apache.org in branch 'dev/trunk'
        [ https://svn.apache.org/r1586600 ]

        LUCENE-5573: Move docvalues constants and helper methods to o.a.l.index.DocValues

        Show
        ASF subversion and git services added a comment - Commit 1586600 from rmuir@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1586600 ] LUCENE-5573 : Move docvalues constants and helper methods to o.a.l.index.DocValues
        Hide
        ASF subversion and git services added a comment -

        Commit 1586604 from rmuir@apache.org in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1586604 ]

        LUCENE-5573: Move docvalues constants and helper methods to o.a.l.index.DocValues

        Show
        ASF subversion and git services added a comment - Commit 1586604 from rmuir@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1586604 ] LUCENE-5573 : Move docvalues constants and helper methods to o.a.l.index.DocValues
        Hide
        ASF subversion and git services added a comment -

        Commit 1586607 from rmuir@apache.org in branch 'dev/branches/lucene_solr_4_8'
        [ https://svn.apache.org/r1586607 ]

        LUCENE-5573: Move docvalues constants and helper methods to o.a.l.index.DocValues

        Show
        ASF subversion and git services added a comment - Commit 1586607 from rmuir@apache.org in branch 'dev/branches/lucene_solr_4_8' [ https://svn.apache.org/r1586607 ] LUCENE-5573 : Move docvalues constants and helper methods to o.a.l.index.DocValues
        Hide
        Uwe Schindler added a comment -

        Close issue after release of 4.8.0

        Show
        Uwe Schindler added a comment - Close issue after release of 4.8.0

          People

          • Assignee:
            Robert Muir
            Reporter:
            Dawid Weiss
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development