Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3.1, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      If solr wants to set the contextClassLoader because its classloading is fucked up, then it needs to do this hack itself: it should not be in lucene code.

      The current mess prevents use of this analyzer in other environments

      1. LUCENE-6774.patch
        4 kB
        Uwe Schindler
      2. LUCENE-6774.patch
        4 kB
        Uwe Schindler
      3. LUCENE-6774.patch
        4 kB
        Uwe Schindler
      4. LUCENE-6774.patch
        2 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          patch.

          Show
          Robert Muir added a comment - patch.
          Hide
          Uwe Schindler added a comment -

          I think the main problem is Morphologik that reads the resources from the context classloader - which is just wrong.

          I'd suggest to not use the static Dictionary#getForLanguage and instead load the FSA with the methods taking InputStream or URL from resources. We just need to duplicate the ISO code -> Resource file name code and then pass the resource URL from the "right" classloader to the load() methods.

          I can have a look.

          Show
          Uwe Schindler added a comment - I think the main problem is Morphologik that reads the resources from the context classloader - which is just wrong. I'd suggest to not use the static Dictionary#getForLanguage and instead load the FSA with the methods taking InputStream or URL from resources. We just need to duplicate the ISO code -> Resource file name code and then pass the resource URL from the "right" classloader to the load() methods. I can have a look.
          Hide
          Robert Muir added a comment -

          Sorry, that is incorrect Uwe. Morfologik "tries" that loader, but it also does things correctly too.

          Look right here:

          https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-fsa/src/main/java/morfologik/util/ResourceUtils.java#L40-L50

          Show
          Robert Muir added a comment - Sorry, that is incorrect Uwe. Morfologik "tries" that loader, but it also does things correctly too. Look right here: https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-fsa/src/main/java/morfologik/util/ResourceUtils.java#L40-L50
          Hide
          Robert Muir added a comment -

          I am committing this because such hacks should not be in lucene code. Please open a separate issue to deal with the solr hack better

          Show
          Robert Muir added a comment - I am committing this because such hacks should not be in lucene code. Please open a separate issue to deal with the solr hack better
          Hide
          ASF subversion and git services added a comment -

          Commit 1700837 from Robert Muir in branch 'dev/trunk'
          [ https://svn.apache.org/r1700837 ]

          LUCENE-6774: Remove solr hack in MorfologikFilter

          Show
          ASF subversion and git services added a comment - Commit 1700837 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1700837 ] LUCENE-6774 : Remove solr hack in MorfologikFilter
          Hide
          ASF subversion and git services added a comment -

          Commit 1700839 from Robert Muir in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1700839 ]

          LUCENE-6774: Remove solr hack in MorfologikFilter

          Show
          ASF subversion and git services added a comment - Commit 1700839 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1700839 ] LUCENE-6774 : Remove solr hack in MorfologikFilter
          Hide
          Uwe Schindler added a comment -

          I linked the issue that causes the havoc here. I don't think this is an issue anymore, because morphologik tries:

          1. context classloader
          2. own classloader
          3. system classloader

          (in this order). I am not sure how this was implemented in the version used in SOLR-4007. Maybe Dawid Weiss knows more. I have he feeling that this is no issue anymore. We should add a test for Solr.

          Show
          Uwe Schindler added a comment - I linked the issue that causes the havoc here. I don't think this is an issue anymore, because morphologik tries: context classloader own classloader system classloader (in this order). I am not sure how this was implemented in the version used in SOLR-4007 . Maybe Dawid Weiss knows more. I have he feeling that this is no issue anymore. We should add a test for Solr.
          Hide
          Yonik Seeley added a comment -

          Hmmm, you essentially reverted SOLR-4007

          Is that code no longer necessary, or will the error that SOLR-4007 resolved now happen again?
          I know SOLR-4007 didn't add a test (and maybe it's hard to do), but that's no reason to knowingly break Solr.

          Show
          Yonik Seeley added a comment - Hmmm, you essentially reverted SOLR-4007 Is that code no longer necessary, or will the error that SOLR-4007 resolved now happen again? I know SOLR-4007 didn't add a test (and maybe it's hard to do), but that's no reason to knowingly break Solr.
          Hide
          Robert Muir added a comment -

          this horrible hack prevents the analyzer from being used in other environments. sorry, solr can set contextclassloader in its own code. Doing it here is no reason to break other non-solr apps

          Believe it or not, there are people that use lucene without solr.

          Show
          Robert Muir added a comment - this horrible hack prevents the analyzer from being used in other environments. sorry, solr can set contextclassloader in its own code. Doing it here is no reason to break other non-solr apps Believe it or not, there are people that use lucene without solr.
          Hide
          Yonik Seeley added a comment -

          If it did break Solr, then we should have come up with a solution that works for everyone.
          Removing that code without even knowing if it broke solr is essentially giving us all the middle finger.

          Show
          Yonik Seeley added a comment - If it did break Solr, then we should have come up with a solution that works for everyone . Removing that code without even knowing if it broke solr is essentially giving us all the middle finger.
          Hide
          Uwe Schindler added a comment -

          Yonik Seeley: I'd suggest to write a test, which is not so easy for analysis extras. From looking at the code I see no reason why Morphologik's code may not find the files. I think this might have been fixed by Dawid to no longer solely depend on context classloader.

          Show
          Uwe Schindler added a comment - Yonik Seeley : I'd suggest to write a test, which is not so easy for analysis extras. From looking at the code I see no reason why Morphologik's code may not find the files. I think this might have been fixed by Dawid to no longer solely depend on context classloader.
          Hide
          Robert Muir added a comment -

          I don't care if i broke solr. solr can fix itself here instead of breaking everyone else. its just that simple. lucene is a library.

          if you think i'm giving solr the middle finger, then well, fine, it deserves it:

          ....................../´¯/) 
          ....................,/¯../ 
          .................../..../ 
          ............./´¯/'...'/´¯¯`·¸ 
          ........../'/.../..../......./¨¯\ 
          ........('(...´...´.... ¯~/'...') 
          .........\.................'...../ 
          ..........''...\.......... _.·´ 
          ............\..............( 
          ..............\.............\...
          
          Show
          Robert Muir added a comment - I don't care if i broke solr. solr can fix itself here instead of breaking everyone else. its just that simple. lucene is a library. if you think i'm giving solr the middle finger, then well, fine, it deserves it: ....................../´¯/) ....................,/¯../ .................../..../ ............./´¯/'...'/´¯¯`·¸ ........../'/.../..../......./¨¯\ ........('(...´...´.... ¯~/'...') .........\.................'...../ ..........''...\.......... _.·´ ............\..............( ..............\.............\...
          Hide
          Uwe Schindler added a comment -

          Setting a context class loader inside "library" code is a big problem and should never ever done. It is good that Robert found this problem. The code has to be removed, sorry.

          Please let us fix the code by:

          1. first adding a test
          2. ask Dawid Weiss if there were changes he did after this issue in Morphlogik's resource loading

          If both of this does not work would suggest to:

          1. let Solr set the context ClassLoader: SOLR-3716
          2. just load the resource files ourselves and pass them to Dictionary#load(). This is mainly because I am not happy that Morfologik uses context ClassLoader at all. So I would like to fix this by loading the file ourselves from Class's classpath (or in the factory using ResourceLoader as provided by Lucene/Solr).
          Show
          Uwe Schindler added a comment - Setting a context class loader inside "library" code is a big problem and should never ever done. It is good that Robert found this problem. The code has to be removed, sorry. Please let us fix the code by: first adding a test ask Dawid Weiss if there were changes he did after this issue in Morphlogik's resource loading If both of this does not work would suggest to: let Solr set the context ClassLoader: SOLR-3716 just load the resource files ourselves and pass them to Dictionary#load(). This is mainly because I am not happy that Morfologik uses context ClassLoader at all. So I would like to fix this by loading the file ourselves from Class's classpath (or in the factory using ResourceLoader as provided by Lucene/Solr).
          Hide
          Yonik Seeley added a comment -

          Setting as a blocker until someone can determine if Solr is now broken or not. Hopefully Uwe is right and the code is no longer needed.

          Show
          Yonik Seeley added a comment - Setting as a blocker until someone can determine if Solr is now broken or not. Hopefully Uwe is right and the code is no longer needed.
          Hide
          Uwe Schindler added a comment - - edited

          Hi,
          I checked the code after a beer and now I know what the issue is. In fact the ResourceUtils code did not change since Solr 4.0 where SOLR-4007 was reported on. This was using version 1.5.5 of Morphologik.

          The problem is very simple:

          1. Morphologic tries context class loader first, obviously this fails for Solr (SOLR-3716)
          2. As a second try it does "the right thing" but obviously wrong: It uses ResourceUtil.class.getResourceAsStream(), and this fails because when loading the resource it uses an absolute path inclusive package, but without a slash. This of course fails, because the file is not found. This causes Solr fail.
          3. Finally it tries the system classloader. Of course the resource isn't there, because the system classloader has no morphologic jars

          The issue here is the stupid java difference: If you use a Classloader, you don't need leading slash. If you use Class#getResource() it resolves against current package, so you need a leading slash, if you want it with an absolute path.

          In single classloader applications this is no issue, because the context classloader always works, but not in Solr.

          Show
          Uwe Schindler added a comment - - edited Hi, I checked the code after a beer and now I know what the issue is. In fact the ResourceUtils code did not change since Solr 4.0 where SOLR-4007 was reported on. This was using version 1.5.5 of Morphologik. The problem is very simple: Morphologic tries context class loader first, obviously this fails for Solr ( SOLR-3716 ) As a second try it does "the right thing" but obviously wrong: It uses ResourceUtil.class.getResourceAsStream(), and this fails because when loading the resource it uses an absolute path inclusive package, but without a slash. This of course fails, because the file is not found. This causes Solr fail. Finally it tries the system classloader. Of course the resource isn't there, because the system classloader has no morphologic jars The issue here is the stupid java difference: If you use a Classloader, you don't need leading slash. If you use Class#getResource() it resolves against current package, so you need a leading slash, if you want it with an absolute path. In single classloader applications this is no issue, because the context classloader always works, but not in Solr.
          Hide
          Uwe Schindler added a comment -

          My proposal:

          • we should load the resource using local path (it is very simple, its just the language name)
          • Dawid Weiss should maybe fix this in morphologik, the code is still broken. Maybe he should remove the whole class (he has a nocommit there)

          I'll supply a patch for a solution that always works not dealing with context classloader.

          Show
          Uwe Schindler added a comment - My proposal: we should load the resource using local path (it is very simple, its just the language name) Dawid Weiss should maybe fix this in morphologik, the code is still broken. Maybe he should remove the whole class (he has a nocommit there) I'll supply a patch for a solution that always works not dealing with context classloader.
          Hide
          Uwe Schindler added a comment -

          Patch. This also removes the horrible cache and uses the only existing dictionary ("pl") statically.

          We should maybe add some additional code to the factory to also allow loading from plain files. But that's another issue.

          Show
          Uwe Schindler added a comment - Patch. This also removes the horrible cache and uses the only existing dictionary ("pl") statically. We should maybe add some additional code to the factory to also allow loading from plain files. But that's another issue.
          Hide
          Dawid Weiss added a comment -

          I will fix this in Morfologik so that dictionaries are not looked up via class resource search mechanism. The reason context class loader was used was not just Solr – it was also a workaround for certain web servlet containers (as far as I can remember).

          I am working on a proper fix in Morfologik, which I'll then apply to Lucene/ Solr.

          Show
          Dawid Weiss added a comment - I will fix this in Morfologik so that dictionaries are not looked up via class resource search mechanism. The reason context class loader was used was not just Solr – it was also a workaround for certain web servlet containers (as far as I can remember). I am working on a proper fix in Morfologik, which I'll then apply to Lucene/ Solr.
          Hide
          Uwe Schindler added a comment -

          I changed the static constant with the default dict to be consistent.

          Show
          Uwe Schindler added a comment - I changed the static constant with the default dict to be consistent.
          Hide
          Dawid Weiss added a comment -

          Thanks Uwe, yep, it looks good to me as a temporary workaround.

          Show
          Dawid Weiss added a comment - Thanks Uwe, yep, it looks good to me as a temporary workaround.
          Hide
          Uwe Schindler added a comment -

          Dawid: I think the quickest fix would be to replace the second try to add a leading slash in ResourceUtils. For now, we prefer to have the dictionary loaded statically.

          Show
          Uwe Schindler added a comment - Dawid: I think the quickest fix would be to replace the second try to add a leading slash in ResourceUtils. For now, we prefer to have the dictionary loaded statically.
          Hide
          Dawid Weiss added a comment -

          I actually plan to remove all that logic from ResourceUtils... it should be up to the caller to figure out how to locate dictionary resources – that's where the knowledge of how to do it should be.

          Show
          Dawid Weiss added a comment - I actually plan to remove all that logic from ResourceUtils... it should be up to the caller to figure out how to locate dictionary resources – that's where the knowledge of how to do it should be.
          Hide
          Uwe Schindler added a comment -

          Maybe add a method with a classloader.

          Show
          Uwe Schindler added a comment - Maybe add a method with a classloader.
          Hide
          Dawid Weiss added a comment -

          I think an URI would be most flexible.

          Show
          Dawid Weiss added a comment - I think an URI would be most flexible.
          Hide
          Uwe Schindler added a comment -

          OK or URI. I think the current Lucene code is not really a "workaround" we can keep it as is. It does not hack anything it just uses the official APIs. The only thing thats a hack is the hardcoded package paths. Maybe add those as constants to Morfologik.

          Show
          Uwe Schindler added a comment - OK or URI. I think the current Lucene code is not really a "workaround" we can keep it as is. It does not hack anything it just uses the official APIs. The only thing thats a hack is the hardcoded package paths. Maybe add those as constants to Morfologik.
          Hide
          Uwe Schindler added a comment -

          New patch. There was a copy'n'paste problem with error message in last one.

          Show
          Uwe Schindler added a comment - New patch. There was a copy'n'paste problem with error message in last one.
          Hide
          Dawid Weiss added a comment - - edited

          You see Morfologik does ship with the Polish dictionary, but it's merely a simple wrapper around a static inflection dictionary. You could build these yourself for an arbitrary language and reuse all of the code to just load and scan it – this is what this project does, for example:

          https://languagetool.org/

          That's why I think it'd make more sense to remove all of these resource-loading facilities and simply require an URI to the data itself.

          Your solution is great, of course!

          Show
          Dawid Weiss added a comment - - edited You see Morfologik does ship with the Polish dictionary, but it's merely a simple wrapper around a static inflection dictionary. You could build these yourself for an arbitrary language and reuse all of the code to just load and scan it – this is what this project does, for example: https://languagetool.org/ That's why I think it'd make more sense to remove all of these resource-loading facilities and simply require an URI to the data itself. Your solution is great, of course!
          Hide
          Uwe Schindler added a comment -

          URI -> URL (like returned by Class#getResource()). URI does not supply InputStreams.

          Show
          Uwe Schindler added a comment - URI -> URL (like returned by Class#getResource()). URI does not supply InputStreams.
          Hide
          Uwe Schindler added a comment -

          Personally, I would also like to change the factory class to take the language code (for backwards compatibility) and otherwise use dict and meta file as separate config params. The default Lucene ResourceLoader interface would then convert them to a URL/InputStream, so it works in Solr from conf/ directory, too.

          Show
          Uwe Schindler added a comment - Personally, I would also like to change the factory class to take the language code (for backwards compatibility) and otherwise use dict and meta file as separate config params. The default Lucene ResourceLoader interface would then convert them to a URL/InputStream, so it works in Solr from conf/ directory, too.
          Hide
          Dawid Weiss added a comment -
          Show
          Dawid Weiss added a comment - What you should be using is this explicit constructor: https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-stemming/src/main/java/morfologik/stemming/Dictionary.java#L64 FSA can be read from an InputStream: https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-fsa/src/main/java/morfologik/fsa/FSA.java#L256 And DictionaryMetadata can be constructed programmatically or otherwise. Here is the method that does the loading from two streams (FSA and properties): https://github.com/morfologik/morfologik-stemming/blob/master/morfologik-stemming/src/main/java/morfologik/stemming/Dictionary.java#L106-L156
          Hide
          Uwe Schindler added a comment -

          OK thanks. I will open separate issue about the factory.

          Show
          Uwe Schindler added a comment - OK thanks. I will open separate issue about the factory.
          Hide
          ASF subversion and git services added a comment -

          Commit 1700903 from Uwe Schindler in branch 'dev/trunk'
          [ https://svn.apache.org/r1700903 ]

          LUCENE-6774: Remove classloader hack in MorfologikFilter #2

          Show
          ASF subversion and git services added a comment - Commit 1700903 from Uwe Schindler in branch 'dev/trunk' [ https://svn.apache.org/r1700903 ] LUCENE-6774 : Remove classloader hack in MorfologikFilter #2
          Hide
          ASF subversion and git services added a comment -

          Commit 1700904 from Uwe Schindler in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1700904 ]

          Merged revision(s) 1700903 from lucene/dev/trunk:
          LUCENE-6774: Remove classloader hack in MorfologikFilter #2

          Show
          ASF subversion and git services added a comment - Commit 1700904 from Uwe Schindler in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1700904 ] Merged revision(s) 1700903 from lucene/dev/trunk: LUCENE-6774 : Remove classloader hack in MorfologikFilter #2
          Hide
          Uwe Schindler added a comment -

          OK. I will open a new issue to allow loading arbitrary dicts with morfologik with the factory. The current factory only allows for a language name, which must be packaged in a JAR file in correct package.

          Show
          Uwe Schindler added a comment - OK. I will open a new issue to allow loading arbitrary dicts with morfologik with the factory. The current factory only allows for a language name, which must be packaged in a JAR file in correct package.
          Hide
          Uwe Schindler added a comment -

          Reopen for backport

          Show
          Uwe Schindler added a comment - Reopen for backport
          Hide
          ASF subversion and git services added a comment -

          Commit 1701811 from Uwe Schindler in branch 'dev/branches/lucene_solr_5_3'
          [ https://svn.apache.org/r1701811 ]

          Backport:
          LUCENE-6774: Remove classloader hack in MorfologikFilter

          Show
          ASF subversion and git services added a comment - Commit 1701811 from Uwe Schindler in branch 'dev/branches/lucene_solr_5_3' [ https://svn.apache.org/r1701811 ] Backport: LUCENE-6774 : Remove classloader hack in MorfologikFilter
          Hide
          Uwe Schindler added a comment -

          OK, I backported.

          Show
          Uwe Schindler added a comment - OK, I backported.

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development