Solr
  1. Solr
  2. SOLR-2474

Analysis.jsp and AnalaysisRequestHandlerBase do not correctly clear attributes on caching tokens

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      When caching tokens, the helper TokenStreams in analysis.jsp and AnalysisRequestHandlerBase do not clear all attributes.
      The issue is tricky. The cached tokens do not contain all attributes in early stages, so copyTo() does not necessarily overwrite all attributes in "this". Calling clearAttributes ensures this.

      Was: LUCENE-2901 broke protected words by only setting and never clearing (that change should have been accompanied by offsetting code to clear the attribute somewhere).

      The problem here was, that this attribute was added later in anaylsis chain, so cached tokens don't include this. Sorry, that was my fault when rewriting analysis.jsp together with Robert

      1. SOLR-2474.patch
        1 kB
        Uwe Schindler

        Activity

        Hide
        Robert Muir added a comment -

        Bulk close for 3.2

        Show
        Robert Muir added a comment - Bulk close for 3.2
        Hide
        Uwe Schindler added a comment -

        Committed 3.x revision: 1095519
        Merged trunk revision: 1095521

        Thanks Robert for investigating and Yonik for reporting this hairy one.

        Show
        Uwe Schindler added a comment - Committed 3.x revision: 1095519 Merged trunk revision: 1095521 Thanks Robert for investigating and Yonik for reporting this hairy one.
        Hide
        Uwe Schindler added a comment - - edited

        Patch (its on Lucene 3.1 branch, but should apply for 3.x and trunk, too)

        Show
        Uwe Schindler added a comment - - edited Patch (its on Lucene 3.1 branch, but should apply for 3.x and trunk, too)
        Hide
        Uwe Schindler added a comment -

        I moved this issue over to Solr, as it has nothing to do with Lucene.

        Show
        Uwe Schindler added a comment - I moved this issue over to Solr, as it has nothing to do with Lucene.
        Hide
        Uwe Schindler added a comment -

        I found the bug:
        The problem is in analysis.jsp (3.x version), line 227: there should be a clearAttributes() first.
        Reason is: In early stages, cached tokens dont have the Keyword Attribute, so the following copyTo() does not overwrite all attributes.
        Same applies for AnalysisReqHandlerBase. The depreacted AnalysisReqHandler does not has this problem as it does not debug all filters and caches no Tokens.
        Patch is coming...

        Show
        Uwe Schindler added a comment - I found the bug: The problem is in analysis.jsp (3.x version), line 227: there should be a clearAttributes() first. Reason is: In early stages, cached tokens dont have the Keyword Attribute, so the following copyTo() does not overwrite all attributes. Same applies for AnalysisReqHandlerBase. The depreacted AnalysisReqHandler does not has this problem as it does not debug all filters and caches no Tokens. Patch is coming...
        Hide
        Robert Muir added a comment -

        I created an issue for the analysis.jsp: SOLR-2473

        the problem is likely to cause a lot of confusion, and as Uwe said we should check the similar AnalysisRequestHandler too

        Show
        Robert Muir added a comment - I created an issue for the analysis.jsp: SOLR-2473 the problem is likely to cause a lot of confusion, and as Uwe said we should check the similar AnalysisRequestHandler too
        Hide
        Uwe Schindler added a comment -

        I think the stupid Token caching in analysis.jsp is maybe broken. Does it also affect *AnalysisRequestHandler?

        Show
        Uwe Schindler added a comment - I think the stupid Token caching in analysis.jsp is maybe broken. Does it also affect *AnalysisRequestHandler?
        Hide
        Yonik Seeley added a comment -

        clearing should be done by the Tokenizer or other token-producers

        Whew... ok, I didn't realize that pre-dated LUCENE-2901

        So perhaps this is just an analysis.jsp bug, since a query of
        http://localhost:8983/solr/select?q="dontstems hellos"&debugQuery=true
        seems to work fine and produce "dontstems hello"

        Show
        Yonik Seeley added a comment - clearing should be done by the Tokenizer or other token-producers Whew... ok, I didn't realize that pre-dated LUCENE-2901 So perhaps this is just an analysis.jsp bug, since a query of http://localhost:8983/solr/select?q= "dontstems hellos"&debugQuery=true seems to work fine and produce "dontstems hello"
        Hide
        Robert Muir added a comment -

        This is just a bug in analysis.jsp, compare the query debug output "dontstems bees" to the analysis.jsp output of dontstems bees, and you will see what I mean.

        There is nothing wrong with the lucene filter here!

        Show
        Robert Muir added a comment - This is just a bug in analysis.jsp, compare the query debug output "dontstems bees" to the analysis.jsp output of dontstems bees, and you will see what I mean. There is nothing wrong with the lucene filter here!
        Hide
        Robert Muir added a comment -

        I see the problem with the example config: simply enter "dontstems foo"

        But, we need to figure out:

        1. is it only a bug in analysis.jsp?
        2. if not, who isn't clearing attributes.
        Show
        Robert Muir added a comment - I see the problem with the example config: simply enter "dontstems foo" But, we need to figure out: is it only a bug in analysis.jsp? if not, who isn't clearing attributes.
        Hide
        Uwe Schindler added a comment -

        Yonik, clearing should be done by the Tokenizer or other token-producers (if a filter inserts Tokens, it also has to clear Attributes). If the Tokenizer does not clear all Attributes using clearAttributes(), it is borken. But not this one.

        Can you post the config of your Tokenizers and Filters or which Analyzer is affected?

        Show
        Uwe Schindler added a comment - Yonik, clearing should be done by the Tokenizer or other token-producers (if a filter inserts Tokens, it also has to clear Attributes). If the Tokenizer does not clear all Attributes using clearAttributes(), it is borken. But not this one. Can you post the config of your Tokenizers and Filters or which Analyzer is affected?
        Hide
        Robert Muir added a comment -

        all tokenizers should be calling clearAttributes()? where is the problem?

        Show
        Robert Muir added a comment - all tokenizers should be calling clearAttributes()? where is the problem?

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development