Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1843

Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 2.9
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This patch converts some remaining tests to the new TokenStream API and non-deprecated classes.
      This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to also support copying e.g. TermAttributeImpl into Token. The target impl must only support all interfaces but must not be of the same type. Token and TokenWrapper use optimized coping without casting to 6 interfaces where possible.
      Maybe the special tokenizers in contrib (shingle matrix and so on using tokens to cache may be enhanced by that). Also Yonik's request for optimized copying of states between incompatible AttributeSources may be enhanced by that (possibly a new issue).

      1. LUCENE-1843.patch
        10 kB
        Uwe Schindler
      2. LUCENE-1843.patch
        119 kB
        Uwe Schindler
      3. LUCENE-1843.patch
        121 kB
        Uwe Schindler
      4. LUCENE-1843.patch
        128 kB
        Uwe Schindler

        Activity

        Hide
        thetaphi Uwe Schindler added a comment -

        Committed revision: 807190

        Show
        thetaphi Uwe Schindler added a comment - Committed revision: 807190
        Hide
        thetaphi Uwe Schindler added a comment -

        Some more tests converted. Also optimized Token.copyTo() to also respect TokenWrapper as target.

        Show
        thetaphi Uwe Schindler added a comment - Some more tests converted. Also optimized Token.copyTo() to also respect TokenWrapper as target.
        Hide
        thetaphi Uwe Schindler added a comment -
        • Small updates
        • forget conversion of two filters in contrib/memory

        Hope this is the last patch.

        Show
        thetaphi Uwe Schindler added a comment - Small updates forget conversion of two filters in contrib/memory Hope this is the last patch.
        Hide
        thetaphi Uwe Schindler added a comment -

        Now the right file.

        Will commit tomorrow.

        Show
        thetaphi Uwe Schindler added a comment - Now the right file. Will commit tomorrow.
        Hide
        thetaphi Uwe Schindler added a comment - - edited

        Patch that makes all contrib/analyzer tests that work with TokenStreams subclasses of BaseTokenStreamTestCase. This superclass now has a lot of utility methods to check TokenStreams using arrays of strings/ints.

        The patch also contains a better version of SingleTokenTokenStream, using the Token.copyTo() function and a Token/TokenWrapper instance as attribute implementation.

        This patch may still include some unused imports, had no time to check this manually (I am the person, that codes with Notepad...)

        Show
        thetaphi Uwe Schindler added a comment - - edited Patch that makes all contrib/analyzer tests that work with TokenStreams subclasses of BaseTokenStreamTestCase. This superclass now has a lot of utility methods to check TokenStreams using arrays of strings/ints. The patch also contains a better version of SingleTokenTokenStream, using the Token.copyTo() function and a Token/TokenWrapper instance as attribute implementation. This patch may still include some unused imports, had no time to check this manually (I am the person, that codes with Notepad...)
        Hide
        thetaphi Uwe Schindler added a comment -

        From a private mail with Robert Muir:

        yes, all of what you mentioned are problems, and testing for
        attributes that should be there is good in my opinion too.

        I noticed the shingle problem as well, it was strange to test
        termAtt.toString() and expect position increments or types to appear
        :/

        one reason I asked about this, is at some point it would be nice to
        try to factor test cases in lucene contrib. currently, they all have
        same helper methods such as assertAnalyzesTo and this is silly in my
        opinion.

        On Sun, Aug 23, 2009 at 12:57 PM, Uwe Schindler<uwe@thetaphi.de> wrote:
        > There are more problems. The test with getAttribute is good, if you are
        > really sure, if the attribute is really available and want assert this. In
        > all other cases addAttribute should be used to consume a TokenStream. The
        > changed ones were problematic, because they used foreign TokenStreams, do
        > not for sure have all these attributes.
        >
        > I thought, all tests in contrib use LuceneTestCase as superclass, but they
        > use the standard junit class. Because of that I did not notice when I put
        > setOnlyUseNewAPI(true) into LuceneTestCase.setUp(), that they are run with
        > the default false setting.
        >
        > Other problems in these tests are, that some (shingle tests) use
        > TermAttribute.toString() which looks different if the attribute is
        > implemented by TermAttributeImpl (newAPI=true) or Token (newAPI=false).

        Show
        thetaphi Uwe Schindler added a comment - From a private mail with Robert Muir: yes, all of what you mentioned are problems, and testing for attributes that should be there is good in my opinion too. I noticed the shingle problem as well, it was strange to test termAtt.toString() and expect position increments or types to appear :/ one reason I asked about this, is at some point it would be nice to try to factor test cases in lucene contrib. currently, they all have same helper methods such as assertAnalyzesTo and this is silly in my opinion. On Sun, Aug 23, 2009 at 12:57 PM, Uwe Schindler<uwe@thetaphi.de> wrote: > There are more problems. The test with getAttribute is good, if you are > really sure, if the attribute is really available and want assert this. In > all other cases addAttribute should be used to consume a TokenStream. The > changed ones were problematic, because they used foreign TokenStreams, do > not for sure have all these attributes. > > I thought, all tests in contrib use LuceneTestCase as superclass, but they > use the standard junit class. Because of that I did not notice when I put > setOnlyUseNewAPI(true) into LuceneTestCase.setUp(), that they are run with > the default false setting. > > Other problems in these tests are, that some (shingle tests) use > TermAttribute.toString() which looks different if the attribute is > implemented by TermAttributeImpl (newAPI=true) or Token (newAPI=false).
        Hide
        thetaphi Uwe Schindler added a comment -

        There are some more tests, that fail with onlyUseNewAPI in contrib/analyzers.

        Show
        thetaphi Uwe Schindler added a comment - There are some more tests, that fail with onlyUseNewAPI in contrib/analyzers.
        Hide
        thetaphi Uwe Schindler added a comment -

        Committed revision: 806847

        Show
        thetaphi Uwe Schindler added a comment - Committed revision: 806847
        Hide
        thetaphi Uwe Schindler added a comment -

        Patch - Will commit soon.

        Show
        thetaphi Uwe Schindler added a comment - Patch - Will commit soon.

          People

          • Assignee:
            thetaphi Uwe Schindler
            Reporter:
            thetaphi Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development