Solr
  1. Solr
  2. SOLR-4652

Resource loader has broken behavior for solr.xml plugins (basically ShardHandlerFactory)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2
    • Fix Version/s: 4.3, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      I have the following scenario:
      MyShardHandlerFactory is plugged in via solr.xml. The jar containing MyShardHandlerFactory is in the shared lib dir. There are a couple issues:

      1. From within a per core handler (that is loaded within the core's lib dir), you grab the ShardHandlerFactory from CoreContainer, casting to MyShardHandlerFactory will results in a ClassCastException with a message like "cannot cast instance of MyShardHandlerFactory to MyShardHandlerFactory".

      2. Adding a custom dir for shared lib (for example "mylib") does not work. The ShardHandlerFactory is initialized before sharedLib is loaded.

      I've been pouring through the code on this and I don't see an easy fix. I'll keep looking at it, but I wanted to get this up so hopefully others have some thoughts on how best to fix. IMO, it seems like there needs to be a clear chain of resource loaders (one for loading solr.xml, a child for loading the lib dir, used for solr.xml plugins, a grandchild for per core config, and a great grandchild for per core lib dir based plugins). Right now there are some siblings, because any place a SolrResourceLoader is created with a null parent classloader, it gets the jetty thread's classloader as the parent.

      1. SOLR-4652.patch
        6 kB
        Uwe Schindler
      2. SOLR-4652.patch
        10 kB
        Ryan Ernst
      3. SOLR-4652.patch
        8 kB
        Ryan Ernst

        Activity

        Hide
        Uwe Schindler added a comment - - edited

        Hi Ryan,
        thanks for bringing this up, you made my day! I was also disappointed when I was digging into that several times, last time when working on Lucene's SPI issues.
        In my opinion, the context classloader should be out of scope (in general, code that uses the context class loader as a "default" because no other classloader is there is buggy to me and may drive crazy under certain circumstances like different per-thread behaviour, e.g. in callbacks called from system threads).
        The ResourceLoaders/Classloaders in Solr should be built on startup of the webapplication and should use the webapp classloader as root (and dont use context class loader as replacement for webapp classloader, just get it e.g. from inside the SolrDispatchFilter's init method like this.getClass().getClassLoader()). Any classloader created from solr should be a child/grandchild of this root class loader. The core container's resource loader should be the only child of the webapp classloader. Every solr core then should be a child of the core container's classloader. Also those resource loaders must use parent-first semantics as Java suggests (in that case its unlikely that you get class cannot be cast to same name...).
        I simply had no time to fix this, but I may be able to help. A correct classloader design is hard, but doable - and your issue confirms my problems with the current design.

        Show
        Uwe Schindler added a comment - - edited Hi Ryan, thanks for bringing this up, you made my day! I was also disappointed when I was digging into that several times, last time when working on Lucene's SPI issues. In my opinion, the context classloader should be out of scope (in general, code that uses the context class loader as a "default" because no other classloader is there is buggy to me and may drive crazy under certain circumstances like different per-thread behaviour, e.g. in callbacks called from system threads). The ResourceLoaders/Classloaders in Solr should be built on startup of the webapplication and should use the webapp classloader as root (and dont use context class loader as replacement for webapp classloader, just get it e.g. from inside the SolrDispatchFilter's init method like this.getClass().getClassLoader()). Any classloader created from solr should be a child/grandchild of this root class loader. The core container's resource loader should be the only child of the webapp classloader. Every solr core then should be a child of the core container's classloader. Also those resource loaders must use parent-first semantics as Java suggests (in that case its unlikely that you get class cannot be cast to same name...). I simply had no time to fix this, but I may be able to help. A correct classloader design is hard, but doable - and your issue confirms my problems with the current design.
        Hide
        Mark Miller added a comment -

        The crux of the issue is that solr.xml was never able to specify stuff like this before - so the entire design is still basically what we had when Solr was single core.

        As far as I remember, shard handler started in solrconfig.xml and was per core. At some point, someone moved it to core container so that things like the thread pool could be shared across cores. At this point, we lost the ability to configure it - support was simply removed.

        Much later, I needed to set some timeouts for freebsd blackhole reasons and I found I had no way to do this - so I shoved the shardhandler into solr.xml simply so I could set these timeouts in tests. This is why you couldn't even change the impl - it was all just falling debris.

        At least, if I remember the sequence of events correctly.

        Now that shard handler is in solr.xml and properly supported, it only seems likely that further things will move to solr.xml that belong at the core container level, and so +1 to coming up with a design that makes all this work as it should.

        Show
        Mark Miller added a comment - The crux of the issue is that solr.xml was never able to specify stuff like this before - so the entire design is still basically what we had when Solr was single core. As far as I remember, shard handler started in solrconfig.xml and was per core. At some point, someone moved it to core container so that things like the thread pool could be shared across cores. At this point, we lost the ability to configure it - support was simply removed. Much later, I needed to set some timeouts for freebsd blackhole reasons and I found I had no way to do this - so I shoved the shardhandler into solr.xml simply so I could set these timeouts in tests. This is why you couldn't even change the impl - it was all just falling debris. At least, if I remember the sequence of events correctly. Now that shard handler is in solr.xml and properly supported, it only seems likely that further things will move to solr.xml that belong at the core container level, and so +1 to coming up with a design that makes all this work as it should.
        Hide
        Ryan Ernst added a comment -

        I've spent the better part of 2 days going down the road of refactoring how SolrResourceLoaders are created (to make the chain explicit as Uwe suggested), but this was getting very hairy...

        To solve this issue I am trying something more simple for now. I'm happy to create another jira for refactoring how the classloader/resource loader chains are created.

        Show
        Ryan Ernst added a comment - I've spent the better part of 2 days going down the road of refactoring how SolrResourceLoaders are created (to make the chain explicit as Uwe suggested), but this was getting very hairy... To solve this issue I am trying something more simple for now. I'm happy to create another jira for refactoring how the classloader/resource loader chains are created.
        Hide
        Ryan Ernst added a comment -

        This patch makes core resource loaders use the CoreContainer.loader.getClassLoader() as it's parent (solves issue #1) and adds sharedLib to CoreContainer.loader's class loader before initializing the shard handler factory (solves issue #2).

        I would like to add a unit test to ensure this isn't broken by any future refactoring here. Any thoughts on where, or how (since most tests seem to bypass this initialization for core container) I should add a test?

        Show
        Ryan Ernst added a comment - This patch makes core resource loaders use the CoreContainer.loader.getClassLoader() as it's parent (solves issue #1) and adds sharedLib to CoreContainer.loader's class loader before initializing the shard handler factory (solves issue #2). I would like to add a unit test to ensure this isn't broken by any future refactoring here. Any thoughts on where, or how (since most tests seem to bypass this initialization for core container) I should add a test?
        Hide
        Robert Muir added a comment -

        Can a test be written that just does the simplest checks: getting the loaders and verifying the "hierarchy" with Classloader.getParent()?

        Separately about the refactoring, i think Uwe mentioned (then deleted) a key step for a future issue: in order to be able to guarantee its correct in the future, a good rote refactoring step after this issue would be to remove the 'treat null as context classloader' (instead throw exception if its null!) from SolrResourceLoader and remove the ctor that takes no parent classloader: this way its explicit what is happening everywhere and would reduce the confusion.

        Show
        Robert Muir added a comment - Can a test be written that just does the simplest checks: getting the loaders and verifying the "hierarchy" with Classloader.getParent()? Separately about the refactoring, i think Uwe mentioned (then deleted) a key step for a future issue: in order to be able to guarantee its correct in the future, a good rote refactoring step after this issue would be to remove the 'treat null as context classloader' (instead throw exception if its null!) from SolrResourceLoader and remove the ctor that takes no parent classloader: this way its explicit what is happening everywhere and would reduce the confusion.
        Hide
        Ryan Ernst added a comment -

        Separately about the refactoring, i think Uwe mentioned (then deleted) a key step for a future issue: in order to be able to guarantee its correct in the future, a good rote refactoring step after this issue would be to remove the 'treat null as context classloader' (instead throw exception if its null!) from SolrResourceLoader and remove the ctor that takes no parent classloader: this way its explicit what is happening everywhere and would reduce the confusion.

        Yeah, this is the route I had started going down when I realized it was a sizable undertaking. I've created a separate issue:
        https://issues.apache.org/jira/browse/SOLR-4659

        Show
        Ryan Ernst added a comment - Separately about the refactoring, i think Uwe mentioned (then deleted) a key step for a future issue: in order to be able to guarantee its correct in the future, a good rote refactoring step after this issue would be to remove the 'treat null as context classloader' (instead throw exception if its null!) from SolrResourceLoader and remove the ctor that takes no parent classloader: this way its explicit what is happening everywhere and would reduce the confusion. Yeah, this is the route I had started going down when I realized it was a sizable undertaking. I've created a separate issue: https://issues.apache.org/jira/browse/SOLR-4659
        Hide
        Ryan Ernst added a comment -

        Can a test be written that just does the simplest checks: getting the loaders and verifying the "hierarchy" with Classloader.getParent()?

        Good idea. Done with this new patch.

        Show
        Ryan Ernst added a comment - Can a test be written that just does the simplest checks: getting the loaders and verifying the "hierarchy" with Classloader.getParent()? Good idea. Done with this new patch.
        Hide
        Robert Muir added a comment -

        Patch looks good!

        Show
        Robert Muir added a comment - Patch looks good!
        Hide
        Uwe Schindler added a comment -

        Patch looks fine for now, need to have a closer look. For future patches, please dont reorder import statements, but thats not problematic here.

        Show
        Uwe Schindler added a comment - Patch looks fine for now, need to have a closer look. For future patches, please dont reorder import statements, but thats not problematic here.
        Hide
        Uwe Schindler added a comment -

        Hi Ryan,
        I will take care of this issue. Thanks for spending your time in fixing the resource/classloader hell.

        I will review your patch more closely and will commit it for 4.x and trunk.

        Show
        Uwe Schindler added a comment - Hi Ryan, I will take care of this issue. Thanks for spending your time in fixing the resource/classloader hell. I will review your patch more closely and will commit it for 4.x and trunk.
        Hide
        Ryan Ernst added a comment -

        For future patches, please dont reorder import statements, but thats not problematic here.

        Sorry about that. I have intellij set to reorder imports automatically. I'll figure out how to turn this off for my lucene-solr project.

        Show
        Ryan Ernst added a comment - For future patches, please dont reorder import statements, but thats not problematic here. Sorry about that. I have intellij set to reorder imports automatically. I'll figure out how to turn this off for my lucene-solr project.
        Hide
        Uwe Schindler added a comment -

        Hi Ryan,
        I reviewed your patch, which unfortunately did no longer apply to trunk, because there were changes in CoreContainer. As your patch was no SVN patch, my client was not able to download the correct SVN revision number for your patch, so I had to try out. I was then able to upgrade to latest Solr trunk.

        It would be better to supply Subversion patches in the future, because they contain the metadata required to correctly apply the patches. At least mention the revision number in SVN it was created against.

        I reverted the changes in import statements and in your test I used assertSame() instead of assertEquals(), because the classloader must be the same (==) not equal ones.

        I will commit this soon, once all tests passed.

        Show
        Uwe Schindler added a comment - Hi Ryan, I reviewed your patch, which unfortunately did no longer apply to trunk, because there were changes in CoreContainer. As your patch was no SVN patch, my client was not able to download the correct SVN revision number for your patch, so I had to try out. I was then able to upgrade to latest Solr trunk. It would be better to supply Subversion patches in the future, because they contain the metadata required to correctly apply the patches. At least mention the revision number in SVN it was created against. I reverted the changes in import statements and in your test I used assertSame() instead of assertEquals(), because the classloader must be the same (==) not equal ones. I will commit this soon, once all tests passed.
        Hide
        Uwe Schindler added a comment -

        Committed to trunk and 4.x Will be released with Lucene/Solr 4.3.

        Thanks Ryan, I hope we can fix the remaining issues with chaining of classloaders (disallowing null/context class loader as default classloader, using webapp classloader instead of context one)!

        Show
        Uwe Schindler added a comment - Committed to trunk and 4.x Will be released with Lucene/Solr 4.3. Thanks Ryan, I hope we can fix the remaining issues with chaining of classloaders (disallowing null/context class loader as default classloader, using webapp classloader instead of context one)!
        Hide
        Ryan Ernst added a comment -

        Thanks Uwe!

        I've disabled the "auto reorder imports" feature from IntelliJ, so that should be better in future patches. I will also make sure to figure out the svn revision or use an svn checkout. Sorry about the hassle!

        I'm hoping to tackle refactoring of the shard handler factory stuff next, but then coming back to classloader chaining is a possibility. As I said before, I did start down that road...it is just a doozy. It is pretty straightforward until you get to ZkSolrResourceLoader, then it seemed to get complicated (I stopped there, so I am not sure if there is an easy path there).

        Show
        Ryan Ernst added a comment - Thanks Uwe! I've disabled the "auto reorder imports" feature from IntelliJ, so that should be better in future patches. I will also make sure to figure out the svn revision or use an svn checkout. Sorry about the hassle! I'm hoping to tackle refactoring of the shard handler factory stuff next, but then coming back to classloader chaining is a possibility. As I said before, I did start down that road...it is just a doozy. It is pretty straightforward until you get to ZkSolrResourceLoader, then it seemed to get complicated (I stopped there, so I am not sure if there is an easy path there).
        Hide
        Mark Miller added a comment - - edited

        ZkSolrResourceLoader

        If you fill me in, I can probably help. ZkSolrResourceLoader should be pretty simple - it's intent is simply to load from zk if possible before falling back to how SolrResourceLoader would load - essentially.

        Show
        Mark Miller added a comment - - edited ZkSolrResourceLoader If you fill me in, I can probably help. ZkSolrResourceLoader should be pretty simple - it's intent is simply to load from zk if possible before falling back to how SolrResourceLoader would load - essentially.
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Ryan Ernst
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development