Solr
  1. Solr
  2. SOLR-8460

/analysis/field doesn't always handle custom attributes correctly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3.2, 5.4.1, 5.5
    • Component/s: Schema and Analysis
    • Labels:
      None
    • Flags:
      Patch

      Description

      I've got some custom analysis Attribute implementations in my analysis chain with some other custom analysis components. I found that Solr's Analysis utility screen, powered by /field/analysis (FieldAnalysisRequestHandler subclassing AnalysisRequestHandlerBase) gave me exceptions for two reasons, both having to do with AnalysisRequestHandlerBase.ListBasedTokenStream:

      • Custom implementations of standard Attributes (e.g. FlagsAttribute) would trigger an exception.
      • Calling getAttribute (instead of addAttribute) in a TokenFilter constructor wouldn't find an attribute added by the input TokenStream.
      1. SOLR_8460.patch
        10 kB
        David Smiley
      2. SOLR_8460.patch
        10 kB
        David Smiley

        Activity

        Hide
        Uwe Schindler added a comment - - edited

        Can you provide an example TokenStream, For me it always works. The whole thing is buggy, I agree, but it should work with correctly implemented TokenStreams.

        The most important thing: You have to implement the reflectWith() functionality correctly.

        Calling getAttribute (instead of addAttribute) in a TokenFilter constructor wouldn't find an attribute added by the input TokenStream.

        Thats a bug in your TokenStream! getAttribute is only available for TokenStream consumers that don't want to add attributes they don't need. Producer code must always use addAttribute().

        Custom implementations of standard Attributes (e.g. FlagsAttribute) would trigger an exception.

        If you use an AttributeFactory it should work correctly. In any case we should check that the ListBasedTokenStream uses the same attribute factory as the original tokenstream. This could be the bug here. Because it needs to clone the atttibutes and this fails if the original and the ListBasedTokenSteam uses different factories.

        Show
        Uwe Schindler added a comment - - edited Can you provide an example TokenStream, For me it always works. The whole thing is buggy, I agree, but it should work with correctly implemented TokenStreams. The most important thing: You have to implement the reflectWith() functionality correctly. Calling getAttribute (instead of addAttribute) in a TokenFilter constructor wouldn't find an attribute added by the input TokenStream. Thats a bug in your TokenStream! getAttribute is only available for TokenStream consumers that don't want to add attributes they don't need. Producer code must always use addAttribute(). Custom implementations of standard Attributes (e.g. FlagsAttribute) would trigger an exception. If you use an AttributeFactory it should work correctly. In any case we should check that the ListBasedTokenStream uses the same attribute factory as the original tokenstream. This could be the bug here. Because it needs to clone the atttibutes and this fails if the original and the ListBasedTokenSteam uses different factories.
        Hide
        David Smiley added a comment -

        See attached patch.

        This fix is for ARHB.ListBasedTokenStream to take an AttributeSource in the constructor to serve as the source of the attribute impls which are added in the constructor in a loop. This adds the attributes as side effect as well. There is then no reason to copy the defined attributes in incrementToken because it's done in the constructor.

        I'll commit in a couple days pending feedback.

        Show
        David Smiley added a comment - See attached patch. This fix is for ARHB.ListBasedTokenStream to take an AttributeSource in the constructor to serve as the source of the attribute impls which are added in the constructor in a loop. This adds the attributes as side effect as well. There is then no reason to copy the defined attributes in incrementToken because it's done in the constructor. I'll commit in a couple days pending feedback.
        Hide
        Uwe Schindler added a comment -

        That was exactly the issue I had in mind Thanks for fixing! In any case, the whole thing is a huge bug and should be rewritten!

        Show
        Uwe Schindler added a comment - That was exactly the issue I had in mind Thanks for fixing! In any case, the whole thing is a huge bug and should be rewritten!
        Hide
        David Smiley added a comment -

        Thanks for your review Uwe! It's good to get more eyes on each others work. Looks like our first comments were written simultaneously without the benefit of each of us seeing what the other were about to say.

        Thats a bug in your TokenStream! getAttribute is only available for TokenStream consumers that don't want to add attributes they don't need. Producer code must always use addAttribute().

        I wrote an Attribute implementation in such a way that it didn't require some other attribute, but if it was present then it affected the functionality of the Attribute. So to know if it's present or not, I called getAttribute.

        Do you think the patch is fine as is to be committed or do you have concerns/feedback?

        Show
        David Smiley added a comment - Thanks for your review Uwe! It's good to get more eyes on each others work. Looks like our first comments were written simultaneously without the benefit of each of us seeing what the other were about to say. Thats a bug in your TokenStream! getAttribute is only available for TokenStream consumers that don't want to add attributes they don't need. Producer code must always use addAttribute(). I wrote an Attribute implementation in such a way that it didn't require some other attribute, but if it was present then it affected the functionality of the Attribute. So to know if it's present or not, I called getAttribute. Do you think the patch is fine as is to be committed or do you have concerns/feedback?
        Hide
        Uwe Schindler added a comment -

        I have no real concerns. The only thing I don't really like is the use of addAttributeImpl() in the test. This method is @Internal and should not really be used. It was made public for some special cases like TeeSinkTokenFilter. Maybe we can hide it now. The correct way to make the attribute API use a custom attribute impl is to provide an AttributeFactory that returns the custom implementation for the interface.

        Show
        Uwe Schindler added a comment - I have no real concerns. The only thing I don't really like is the use of addAttributeImpl() in the test. This method is @Internal and should not really be used. It was made public for some special cases like TeeSinkTokenFilter. Maybe we can hide it now. The correct way to make the attribute API use a custom attribute impl is to provide an AttributeFactory that returns the custom implementation for the interface.
        Hide
        Uwe Schindler added a comment -

        I wrote an Attribute implementation in such a way that it didn't require some other attribute, but if it was present then it affected the functionality of the Attribute. So to know if it's present or not, I called getAttribute.

        That's fine. It was just general comment: If your TokenFilter needs a specific attribute, it should call addAttribute. If it only optionally uses it when available, a check with hasAttribute() is fine, too. Although I cannot guarantee that this works with all consumers (like this one!). Some producers make attributes available in a delayed way (e.g. on reset()), so calling hasAttribute or getAttribute on ctor may not reflect the real state. I think this is what happened here (because attribute init was delayed to incrementToken). I don't know why this was implemented like that - maybe because of the delayed attrbutes.

        I'd suggest to still crosscheck in incrementToken() if all Attributes are ready. It is not a performance issue for this handler, as it is intended for debugging only. So I would leave incrementToken() as it is. Maybe do some checks with the web interface and crazy analyzers like Kuromoji or SmartTschinese

        Show
        Uwe Schindler added a comment - I wrote an Attribute implementation in such a way that it didn't require some other attribute, but if it was present then it affected the functionality of the Attribute. So to know if it's present or not, I called getAttribute. That's fine. It was just general comment: If your TokenFilter needs a specific attribute, it should call addAttribute. If it only optionally uses it when available, a check with hasAttribute() is fine, too. Although I cannot guarantee that this works with all consumers (like this one!). Some producers make attributes available in a delayed way (e.g. on reset()), so calling hasAttribute or getAttribute on ctor may not reflect the real state. I think this is what happened here (because attribute init was delayed to incrementToken). I don't know why this was implemented like that - maybe because of the delayed attrbutes. I'd suggest to still crosscheck in incrementToken() if all Attributes are ready. It is not a performance issue for this handler, as it is intended for debugging only. So I would leave incrementToken() as it is. Maybe do some checks with the web interface and crazy analyzers like Kuromoji or SmartTschinese
        Hide
        David Smiley added a comment -

        In this updated patch, I ensured that attributes are added in incrementToken too.

        However I kept it using addAttributeImpl, with the following comment:

              // note: ideally we wouldn't call addAttributeImpl which is marked internal. But nonetheless it's possible
              //  this method is used by some custom attributes, especially since Solr doesn't provide a way to customize the
              //  AttributeFactory which is the recommended way to choose which classes implement which attributes.
        
        Show
        David Smiley added a comment - In this updated patch, I ensured that attributes are added in incrementToken too. However I kept it using addAttributeImpl, with the following comment: // note: ideally we wouldn't call addAttributeImpl which is marked internal. But nonetheless it's possible // this method is used by some custom attributes, especially since Solr doesn't provide a way to customize the // AttributeFactory which is the recommended way to choose which classes implement which attributes.
        Hide
        Uwe Schindler added a comment -

        +1

        especially since Solr doesn't provide a way to customize the AttributeFactory which is the recommended way to choose which classes implement which attributes.

        The correct way to do this is to write a separate/duplicate factory for your Tokenizer that uses another attributefactory. You have to do this anyways, because adding custom attributes later will likely fail, if the tokenizer already added them.

        Show
        Uwe Schindler added a comment - +1 especially since Solr doesn't provide a way to customize the AttributeFactory which is the recommended way to choose which classes implement which attributes. The correct way to do this is to write a separate/duplicate factory for your Tokenizer that uses another attributefactory. You have to do this anyways, because adding custom attributes later will likely fail, if the tokenizer already added them.
        Hide
        ASF subversion and git services added a comment -

        Commit 1721638 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1721638 ]

        SOLR-8460: /analysis/field could throw exceptions for custom attributes.

        Show
        ASF subversion and git services added a comment - Commit 1721638 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1721638 ] SOLR-8460 : /analysis/field could throw exceptions for custom attributes.
        Hide
        ASF subversion and git services added a comment -

        Commit 1721641 from David Smiley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1721641 ]

        SOLR-8460: /analysis/field could throw exceptions for custom attributes.

        Show
        ASF subversion and git services added a comment - Commit 1721641 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1721641 ] SOLR-8460 : /analysis/field could throw exceptions for custom attributes.
        Hide
        ASF subversion and git services added a comment -

        Commit 1721646 from David Smiley in branch 'dev/branches/lucene_solr_5_3'
        [ https://svn.apache.org/r1721646 ]

        SOLR-8460: /analysis/field could throw exceptions for custom attributes.

        Show
        ASF subversion and git services added a comment - Commit 1721646 from David Smiley in branch 'dev/branches/lucene_solr_5_3' [ https://svn.apache.org/r1721646 ] SOLR-8460 : /analysis/field could throw exceptions for custom attributes.
        Hide
        ASF subversion and git services added a comment -

        Commit 1724177 from Adrien Grand in branch 'dev/branches/lucene_solr_5_4'
        [ https://svn.apache.org/r1724177 ]

        SOLR-8460: /analysis/field could throw exceptions for custom attributes.

        Show
        ASF subversion and git services added a comment - Commit 1724177 from Adrien Grand in branch 'dev/branches/lucene_solr_5_4' [ https://svn.apache.org/r1724177 ] SOLR-8460 : /analysis/field could throw exceptions for custom attributes.
        Hide
        ASF subversion and git services added a comment -

        Commit 1724198 from Adrien Grand in branch 'dev/branches/lucene_solr_5_4'
        [ https://svn.apache.org/r1724198 ]

        SOLR-8460, SOLR-8373, SOLR-8422, SOLR-7462, SOLR-8470: Add CHANGES entries for 5.4.1.

        Show
        ASF subversion and git services added a comment - Commit 1724198 from Adrien Grand in branch 'dev/branches/lucene_solr_5_4' [ https://svn.apache.org/r1724198 ] SOLR-8460 , SOLR-8373 , SOLR-8422 , SOLR-7462 , SOLR-8470 : Add CHANGES entries for 5.4.1.
        Hide
        ASF subversion and git services added a comment -

        Commit 9ef144ddefe21f30c1c9ebd5246e7e03387488e1 in lucene-solr's branch refs/heads/branch_5_4 from Adrien Grand
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9ef144d ]

        SOLR-8460, SOLR-8373, SOLR-8422, SOLR-7462, SOLR-8470: Add CHANGES entries for 5.4.1.

        git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_4@1724198 13f79535-47bb-0310-9956-ffa450edef68

        Show
        ASF subversion and git services added a comment - Commit 9ef144ddefe21f30c1c9ebd5246e7e03387488e1 in lucene-solr's branch refs/heads/branch_5_4 from Adrien Grand [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9ef144d ] SOLR-8460 , SOLR-8373 , SOLR-8422 , SOLR-7462 , SOLR-8470 : Add CHANGES entries for 5.4.1. git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_4@1724198 13f79535-47bb-0310-9956-ffa450edef68

          People

          • Assignee:
            David Smiley
            Reporter:
            David Smiley
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development