Solr
  1. Solr
  2. SOLR-7803

Classloading deadlock in TrieField => refactor date formatting/parsing to static utility class

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.2.1
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
    • Environment:

      OSX, JDK8u45

      Description

      When starting a test Sol instance, it locks up sometimes. We took a thread dump and all threads are trying to load classes via Class.forName() and are stuck in that method. One of these threads got one step further into the <clinit> of TrieField where it creates an internal static instance of TrieDateField (circular dependency). I don't know why this locks up exactly, but this code smells anyway. So I removed that instance and made the used methods static in TrieDateField.

      This does not completely remove the circular dependency, but at least it is no more in <clinit>. For the future someone may extract a util class to remove the circular dependency.

      1. SOLR-7803.patch
        70 kB
        Uwe Schindler
      2. SOLR-7803.patch
        58 kB
        Uwe Schindler
      3. TrieField.patch
        6 kB
        Markus Heiden

        Activity

        Hide
        Uwe Schindler added a comment -

        One of these the threads got one step further into the <clinit> of TrieField where it creates an internal static instance of TrieDateField (circular dependency).

        There is no deadlock in triefield. What is done there is 100% correct according to JVM requirements (classes are allowed to init subclasses from clinit of the parent class.

        Especially as there is no Class#forName() in this code, this cannot have to do with this. If you see Class#forName() in the stack trace, this is more related to other problems, because forName() only appears in stack traces when the initialization is done from reflection.

        Could it be that this issue is related to LUCENE-6482, which was fixed in 5.2.1?

        Show
        Uwe Schindler added a comment - One of these the threads got one step further into the <clinit> of TrieField where it creates an internal static instance of TrieDateField (circular dependency). There is no deadlock in triefield. What is done there is 100% correct according to JVM requirements (classes are allowed to init subclasses from clinit of the parent class. Especially as there is no Class#forName() in this code, this cannot have to do with this. If you see Class#forName() in the stack trace, this is more related to other problems, because forName() only appears in stack traces when the initialization is done from reflection. Could it be that this issue is related to LUCENE-6482 , which was fixed in 5.2.1?
        Hide
        Markus Heiden added a comment -

        This problem looks like LUCENE-5573

        Show
        Markus Heiden added a comment - This problem looks like LUCENE-5573
        Hide
        Markus Heiden added a comment - - edited

        Stacktrace showing the TrieField thread. All other threads hang at at java.lang.Class.forName0(Native Method). As mentioned above this looks more like LUCENE-5573 and not like LUCENE-6482. As we are using Solr 5.2.1, we should have the bugfix for LUCENE-6482.

        "coreLoadExecutor-6-thread-20-processing-

        {node_name=localhost:18080_solr}

        " #480 prio=5 os_prio=31 tid=0x00007f9110653000 nid=0x8f03 in Object.wait() [0x000000012b8f2000]
        java.lang.Thread.State: RUNNABLE
        at org.apache.solr.schema.TrieField.<clinit>(TrieField.java:90)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:485)
        at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:560)
        at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:525)
        at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:518)
        at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:90)
        at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:52)
        at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152)
        at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:489)
        at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:175)
        at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
        at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
        at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:102)
        at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:74)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:635)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:381)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:375)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

        Show
        Markus Heiden added a comment - - edited Stacktrace showing the TrieField thread. All other threads hang at at java.lang.Class.forName0(Native Method). As mentioned above this looks more like LUCENE-5573 and not like LUCENE-6482 . As we are using Solr 5.2.1, we should have the bugfix for LUCENE-6482 . "coreLoadExecutor-6-thread-20-processing- {node_name=localhost:18080_solr} " #480 prio=5 os_prio=31 tid=0x00007f9110653000 nid=0x8f03 in Object.wait() [0x000000012b8f2000] java.lang.Thread.State: RUNNABLE at org.apache.solr.schema.TrieField.<clinit>(TrieField.java:90) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:485) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:560) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:525) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:518) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:90) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:52) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:489) at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:175) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:102) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:74) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:635) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:381) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:375) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
        Hide
        Uwe Schindler added a comment -

        OK thanks. I think we can look into this, but the easiest implementation to work around such problems is to do it in a similar way like LUCENE-6482: define a static inner (hidden) class that just have one static field: See the "Holder" class in the patch there. Then replace all references to this static field by "Holder.staticField". This delays the initialization. This is much easier than your patch.

        Although, I don't think that this is the reason for the problem. A superclass is always allowed to load and instantiate a subclass, this is specified in the JVM. It only gets worse if you have this type of stuff around three classes. In any case, such Holder instances always prevent such problems.

        In my opinion, the correct fix for the "bad" design we have here is (unrelated to the class loading deadlock): Refactor the date parsing stuff to a separate Utility class with solely static methods and don't use a TrieDateField instance from the superclass. Both TrieField and TrieDateField use the same set of static methods. The current design is broken, I agree!

        Do you have a testcase to reproduce?

        Show
        Uwe Schindler added a comment - OK thanks. I think we can look into this, but the easiest implementation to work around such problems is to do it in a similar way like LUCENE-6482 : define a static inner (hidden) class that just have one static field: See the "Holder" class in the patch there. Then replace all references to this static field by "Holder.staticField". This delays the initialization. This is much easier than your patch. Although, I don't think that this is the reason for the problem. A superclass is always allowed to load and instantiate a subclass, this is specified in the JVM. It only gets worse if you have this type of stuff around three classes. In any case, such Holder instances always prevent such problems. In my opinion, the correct fix for the "bad" design we have here is (unrelated to the class loading deadlock): Refactor the date parsing stuff to a separate Utility class with solely static methods and don't use a TrieDateField instance from the superclass. Both TrieField and TrieDateField use the same set of static methods. The current design is broken, I agree! Do you have a testcase to reproduce?
        Hide
        Markus Heiden added a comment - - edited

        Sorry, no. We experience this problem when we running our integrations tests, but it happens just in 50% of all runs. We did not figured out what provokes it.

        I think my patch makes the situation better and is small (just making the used methods static and change the callers in TrieField). Maybe I should not have extracted the field source.

        Show
        Markus Heiden added a comment - - edited Sorry, no. We experience this problem when we running our integrations tests, but it happens just in 50% of all runs. We did not figured out what provokes it. I think my patch makes the situation better and is small (just making the used methods static and change the callers in TrieField). Maybe I should not have extracted the field source.
        Hide
        Uwe Schindler added a comment -

        Yeah leave the fieldsource where it is, that was also my main complaint. This change has nothing to do with the whole problem. I can work on a patch just factoring out the static methods, if you like!

        Show
        Uwe Schindler added a comment - Yeah leave the fieldsource where it is, that was also my main complaint. This change has nothing to do with the whole problem. I can work on a patch just factoring out the static methods, if you like!
        Hide
        Markus Heiden added a comment -

        That would be nice. I think the methods somehow belong into DateMathParser but I don't understand the context enough to decide whether to move all code to there or not. Thanks for your effort!

        Show
        Markus Heiden added a comment - That would be nice. I think the methods somehow belong into DateMathParser but I don't understand the context enough to decide whether to move all code to there or not. Thanks for your effort!
        Hide
        Markus Heiden added a comment -

        I re-did the patch. I just made parseMath() static and used formatExternal() for toExternal(). This is not the optimal solution we talked about, but just my prior patch stripped down to the needed changes.

        Show
        Markus Heiden added a comment - I re-did the patch. I just made parseMath() static and used formatExternal() for toExternal(). This is not the optimal solution we talked about, but just my prior patch stripped down to the needed changes.
        Hide
        Uwe Schindler added a comment -

        Attached is a patch that refactors the static methods out to a separate class. It is quite huge, because a lot of places in Solr uses TrieDateField statics, without having to do anything with Trie fields at all. So this cleanup was really required,

        Show
        Uwe Schindler added a comment - Attached is a patch that refactors the static methods out to a separate class. It is quite huge, because a lot of places in Solr uses TrieDateField statics, without having to do anything with Trie fields at all. So this cleanup was really required,
        Hide
        Uwe Schindler added a comment -

        When backporting to 5.x, I will leave the old public/protected methods/fields as delegators in TrieDateField for backwards compatibility (deprecated with reference to new static class).

        Show
        Uwe Schindler added a comment - When backporting to 5.x, I will leave the old public/protected methods/fields as delegators in TrieDateField for backwards compatibility (deprecated with reference to new static class).
        Hide
        Shalin Shekhar Mangar added a comment -

        +1

        Thanks Uwe!

        Show
        Shalin Shekhar Mangar added a comment - +1 Thanks Uwe!
        Hide
        Uwe Schindler added a comment -

        I did a bit more refactoring:

        • rename DateUtils -> DateFormatUtil (the other name was somehow a confusing duplicate, so Eclipse autocomplete showed too much unspecific stuff).
        • I removed more formatting methods out of TrieDateField. TrieDateField is now as any other Trie(Long|Int|Double|Float)Field - short and compact.

        I will commit this later and add backwards layer in 5.x. All tests pass.

        Show
        Uwe Schindler added a comment - I did a bit more refactoring: rename DateUtils -> DateFormatUtil (the other name was somehow a confusing duplicate, so Eclipse autocomplete showed too much unspecific stuff). I removed more formatting methods out of TrieDateField. TrieDateField is now as any other Trie(Long|Int|Double|Float)Field - short and compact. I will commit this later and add backwards layer in 5.x. All tests pass.
        Hide
        ASF subversion and git services added a comment -

        Commit 1691893 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1691893 ]

        SOLR-7803: Prevent class loading deadlock in TrieDateField; refactor date formatting and parsing out of TrieDateField and move to static utility class DateFormatUtil

        Show
        ASF subversion and git services added a comment - Commit 1691893 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1691893 ] SOLR-7803 : Prevent class loading deadlock in TrieDateField; refactor date formatting and parsing out of TrieDateField and move to static utility class DateFormatUtil
        Hide
        ASF subversion and git services added a comment -

        Commit 1691898 from Uwe Schindler in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1691898 ]

        Merged revision(s) 1691893 from lucene/dev/trunk:
        SOLR-7803: Prevent class loading deadlock in TrieDateField; refactor date formatting and parsing out of TrieDateField and move to static utility class DateFormatUtil (includes bw layer)

        Show
        ASF subversion and git services added a comment - Commit 1691898 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1691898 ] Merged revision(s) 1691893 from lucene/dev/trunk: SOLR-7803 : Prevent class loading deadlock in TrieDateField; refactor date formatting and parsing out of TrieDateField and move to static utility class DateFormatUtil (includes bw layer)
        Hide
        ASF subversion and git services added a comment -

        Commit 1691900 from Uwe Schindler in branch 'dev/trunk'
        [ https://svn.apache.org/r1691900 ]

        SOLR-7803: Use Java 8 ThreadLocal

        Show
        ASF subversion and git services added a comment - Commit 1691900 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1691900 ] SOLR-7803 : Use Java 8 ThreadLocal
        Hide
        Uwe Schindler added a comment -

        I committed and backported + added backwards layer. In trunk I also removed the custom ThreadLocal, ThreadLocal#withInitial(FORMAT_PROTOTYPE::clone) is much more elegant.

        If you see other class loading deadlocks in Solr startup, those can be caused by concurrent core initialization, which may be broken under certain cases. Please open other issues about that.

        Show
        Uwe Schindler added a comment - I committed and backported + added backwards layer. In trunk I also removed the custom ThreadLocal, ThreadLocal#withInitial(FORMAT_PROTOTYPE::clone) is much more elegant. If you see other class loading deadlocks in Solr startup, those can be caused by concurrent core initialization, which may be broken under certain cases. Please open other issues about that.
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Markus Heiden
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development