Lucene - Core
  1. Lucene - Core
  2. LUCENE-2484

Remove deprecated TermAttribute from tokenattributes and legacy support in indexer

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The title says it:

      • Remove interface TermAttribute
      • Remove empty fake implementation TermAttributeImpl extends CharTermAttributeImpl
      • Remove methods from CharTermAttributeImpl (and indirect from Token)
      • Remove sophisticated® backwards™ Layer in TermsHash*
      • Remove IAE from NumericTokenStream, if TA is available in AS
      • Fix rest of core tests (TestToken)
      1. LUCENE-2484.patch
        32 kB
        Uwe Schindler
      2. LUCENE-2484.patch
        30 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          So after hours of waiting:

          • Lucene Core, Contrib is fine
          • Modules fine
          • Solr core works
          • Solr carrot2 is broken and this is not fixable!

          Its broken by itsself:
          The contrib module for carrot clustering depends on carrot, but carrot itsself depends on an older version of Lucene! We only have the binary, so we cannot fix it:

          java.lang.NoClassDefFoundError: org/apache/lucene/analysis/tokenattributes/TermAttribute
          	at org.carrot2.text.analysis.ExtendedWhitespaceTokenizer.<init>(ExtendedWhitespaceTokenizer.java:53)
          	at org.carrot2.text.analysis.ExtendedWhitespaceAnalyzer.tokenStream(ExtendedWhitespaceAnalyzer.java:28)
          	at org.carrot2.text.analysis.ActiveLanguageAnalyzer.tokenStream(ActiveLanguageAnalyzer.java:53)
          	at org.carrot2.text.preprocessing.Tokenizer.tokenize(Tokenizer.java:171)
          	at org.carrot2.text.preprocessing.PreprocessingPipeline.preprocess(PreprocessingPipeline.java:96)
          	at org.carrot2.text.preprocessing.PreprocessingPipeline.preprocess(PreprocessingPipeline.java:87)
          	at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.process(LingoClusteringAlgorithm.java:155)
          	at org.carrot2.core.ControllerUtils.performProcessing(ControllerUtils.java:95)
          	at org.carrot2.core.ControllerUtils.performProcessing(ControllerUtils.java:138)
          	at org.carrot2.core.CachingController.processInternal(CachingController.java:279)
          	at org.carrot2.core.CachingController.process(CachingController.java:224)
          	at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:78)
          	at org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:75)
          	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
          	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          	at org.apache.solr.handler.clustering.ClusteringComponentTest.testComponent(ClusteringComponentTest.java:57)
          Caused by: java.lang.ClassNotFoundException: org.apache.lucene.analysis.tokenattributes.TermAttribute
          	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
          	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
          	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
          	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
          

          We cannot move forward in breaking backwards in trunk, so I suggest to disable this contrib in trunk! We can only support it if, it moves to Lucene/Solr in complete and we have the source code in solr's SVN.

          Thoughts?

          Show
          Uwe Schindler added a comment - So after hours of waiting: Lucene Core, Contrib is fine Modules fine Solr core works Solr carrot2 is broken and this is not fixable! Its broken by itsself: The contrib module for carrot clustering depends on carrot, but carrot itsself depends on an older version of Lucene! We only have the binary, so we cannot fix it: java.lang.NoClassDefFoundError: org/apache/lucene/analysis/tokenattributes/TermAttribute at org.carrot2.text.analysis.ExtendedWhitespaceTokenizer.<init>(ExtendedWhitespaceTokenizer.java:53) at org.carrot2.text.analysis.ExtendedWhitespaceAnalyzer.tokenStream(ExtendedWhitespaceAnalyzer.java:28) at org.carrot2.text.analysis.ActiveLanguageAnalyzer.tokenStream(ActiveLanguageAnalyzer.java:53) at org.carrot2.text.preprocessing.Tokenizer.tokenize(Tokenizer.java:171) at org.carrot2.text.preprocessing.PreprocessingPipeline.preprocess(PreprocessingPipeline.java:96) at org.carrot2.text.preprocessing.PreprocessingPipeline.preprocess(PreprocessingPipeline.java:87) at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.process(LingoClusteringAlgorithm.java:155) at org.carrot2.core.ControllerUtils.performProcessing(ControllerUtils.java:95) at org.carrot2.core.ControllerUtils.performProcessing(ControllerUtils.java:138) at org.carrot2.core.CachingController.processInternal(CachingController.java:279) at org.carrot2.core.CachingController.process(CachingController.java:224) at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:78) at org.apache.solr.handler.clustering.ClusteringComponent.process(ClusteringComponent.java:75) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.handler.clustering.ClusteringComponentTest.testComponent(ClusteringComponentTest.java:57) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.analysis.tokenattributes.TermAttribute at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) We cannot move forward in breaking backwards in trunk, so I suggest to disable this contrib in trunk! We can only support it if, it moves to Lucene/Solr in complete and we have the source code in solr's SVN. Thoughts?
          Hide
          Robert Muir added a comment -

          We cannot move forward in breaking backwards in trunk, so I suggest to disable this contrib in trunk!

          +1

          Show
          Robert Muir added a comment - We cannot move forward in breaking backwards in trunk, so I suggest to disable this contrib in trunk! +1
          Hide
          Stanislaw Osinski added a comment -

          Hi!

          Against which version of Lucene should we refactor/ build Carrot2 to fix the issue? Does it have to be trunk?

          Thanks!

          S.

          Show
          Stanislaw Osinski added a comment - Hi! Against which version of Lucene should we refactor/ build Carrot2 to fix the issue? Does it have to be trunk? Thanks! S.
          Hide
          Uwe Schindler added a comment -

          Against which version of Lucene should we refactor/ build Carrot2 to fix the issue? Does it have to be trunk?

          This is against Solr trunk, what does the question mean? This is marked deprecated in branch_3x, so you have to fix it

          The whole thing is broken per se:
          Solr trunk cannot depend on external APIs that itsself depend on older Lucene APIs

          Show
          Uwe Schindler added a comment - Against which version of Lucene should we refactor/ build Carrot2 to fix the issue? Does it have to be trunk? This is against Solr trunk , what does the question mean? This is marked deprecated in branch_3x, so you have to fix it The whole thing is broken per se: Solr trunk cannot depend on external APIs that itsself depend on older Lucene APIs
          Hide
          Robert Muir added a comment -

          Since this clustering contrib depends on binary files that are tied to specific versions of the Lucene API,
          I suggest the following:

          • only enable clustering in release branches (such as 3x)
          • when we cut a new release branch from trunk (say we make a 4x), then add the new version there that works with it.
          • but never have this enabled in trunk, as it is a cyclic dependency

          the problem is, further changes might happen in trunk as it is backwards-incompatible.

          as a realistic theoretical example, we might all decide to move to write-once attributes API (LUCENE-2450).
          In this case perhaps Uwe writes a sophisticated ® backwards for 3x, to enable easy migration,
          but then the old TokenStream API itself would be deprecated in 3x and removed in trunk so we can keep going.

          Show
          Robert Muir added a comment - Since this clustering contrib depends on binary files that are tied to specific versions of the Lucene API, I suggest the following: only enable clustering in release branches (such as 3x) when we cut a new release branch from trunk (say we make a 4x), then add the new version there that works with it. but never have this enabled in trunk, as it is a cyclic dependency the problem is, further changes might happen in trunk as it is backwards-incompatible. as a realistic theoretical example, we might all decide to move to write-once attributes API ( LUCENE-2450 ). In this case perhaps Uwe writes a sophisticated ® backwards for 3x, to enable easy migration, but then the old TokenStream API itself would be deprecated in 3x and removed in trunk so we can keep going.
          Hide
          Uwe Schindler added a comment -

          Additionally:
          I will commit this soon. I dont care on external APIs. This is trunk and as discussed in the mailing list: Lucene trunk is no longer backwards compatible!

          So: This contrib has to be removed from solr trunk! This contrib is part of carrot, so why is it in Solr's contrib at all. It's License itsself is incompatible. We will remove all deprecated APIs in trunk soon! Whole Lucene trunk moves to FLEX APIs now, so this broken at all.

          Sorry, this is outside of this issue, please move this discussion to mailing list!

          Out of Scope:
          As a Lucene-affiliated committer (personally, I dont care of Solr at all and I was against merging Solr+Lucene!), I cannot take care of external binary-only JARs that itsself depend on old Lucene APIs !!!

          Show
          Uwe Schindler added a comment - Additionally: I will commit this soon. I dont care on external APIs. This is trunk and as discussed in the mailing list: Lucene trunk is no longer backwards compatible! So: This contrib has to be removed from solr trunk! This contrib is part of carrot, so why is it in Solr's contrib at all. It's License itsself is incompatible. We will remove all deprecated APIs in trunk soon! Whole Lucene trunk moves to FLEX APIs now, so this broken at all. Sorry, this is outside of this issue, please move this discussion to mailing list! Out of Scope: As a Lucene-affiliated committer (personally, I dont care of Solr at all and I was against merging Solr+Lucene!), I cannot take care of external binary-only JARs that itsself depend on old Lucene APIs !!!
          Hide
          Stanislaw Osinski added a comment -

          Since this clustering contrib depends on binary files that are tied to specific versions of the Lucene API,
          I suggest the following:

          • only enable clustering in release branches (such as 3x)
          • when we cut a new release branch from trunk (say we make a 4x), then add the new version there that works with it.
          • but never have this enabled in trunk, as it is a cyclic dependency

          Sounds very good to me, thanks for the explanation!

          Show
          Stanislaw Osinski added a comment - Since this clustering contrib depends on binary files that are tied to specific versions of the Lucene API, I suggest the following: only enable clustering in release branches (such as 3x) when we cut a new release branch from trunk (say we make a 4x), then add the new version there that works with it. but never have this enabled in trunk, as it is a cyclic dependency Sounds very good to me, thanks for the explanation!
          Hide
          Uwe Schindler added a comment -

          Updated patch, that also fixes a bug in NumericTokenStreams isAssignableFrom checks to prevent addition of a CTA subclass. This bug existed before, but its easier to fix here (and not serious).

          Show
          Uwe Schindler added a comment - Updated patch, that also fixes a bug in NumericTokenStreams isAssignableFrom checks to prevent addition of a CTA subclass. This bug existed before, but its easier to fix here (and not serious).
          Hide
          Uwe Schindler added a comment -

          Committed revision: 952616

          Show
          Uwe Schindler added a comment - Committed revision: 952616

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development