Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2474

Analysis.jsp and AnalaysisRequestHandlerBase do not correctly clear attributes on caching tokens

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      When caching tokens, the helper TokenStreams in analysis.jsp and AnalysisRequestHandlerBase do not clear all attributes.
      The issue is tricky. The cached tokens do not contain all attributes in early stages, so copyTo() does not necessarily overwrite all attributes in "this". Calling clearAttributes ensures this.

      Was: LUCENE-2901 broke protected words by only setting and never clearing (that change should have been accompanied by offsetting code to clear the attribute somewhere).

      The problem here was, that this attribute was added later in anaylsis chain, so cached tokens don't include this. Sorry, that was my fault when rewriting analysis.jsp together with Robert

      1. SOLR-2474.patch
        1 kB
        Uwe Schindler

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        Bulk close for 3.2

        Show
        rcmuir Robert Muir added a comment - Bulk close for 3.2
        Hide
        thetaphi Uwe Schindler added a comment -

        Committed 3.x revision: 1095519
        Merged trunk revision: 1095521

        Thanks Robert for investigating and Yonik for reporting this hairy one.

        Show
        thetaphi Uwe Schindler added a comment - Committed 3.x revision: 1095519 Merged trunk revision: 1095521 Thanks Robert for investigating and Yonik for reporting this hairy one.
        Hide
        thetaphi Uwe Schindler added a comment - - edited

        Patch (its on Lucene 3.1 branch, but should apply for 3.x and trunk, too)

        Show
        thetaphi Uwe Schindler added a comment - - edited Patch (its on Lucene 3.1 branch, but should apply for 3.x and trunk, too)
        Hide
        thetaphi Uwe Schindler added a comment -

        I moved this issue over to Solr, as it has nothing to do with Lucene.

        Show
        thetaphi Uwe Schindler added a comment - I moved this issue over to Solr, as it has nothing to do with Lucene.
        Hide
        thetaphi Uwe Schindler added a comment -

        I found the bug:
        The problem is in analysis.jsp (3.x version), line 227: there should be a clearAttributes() first.
        Reason is: In early stages, cached tokens dont have the Keyword Attribute, so the following copyTo() does not overwrite all attributes.
        Same applies for AnalysisReqHandlerBase. The depreacted AnalysisReqHandler does not has this problem as it does not debug all filters and caches no Tokens.
        Patch is coming...

        Show
        thetaphi Uwe Schindler added a comment - I found the bug: The problem is in analysis.jsp (3.x version), line 227: there should be a clearAttributes() first. Reason is: In early stages, cached tokens dont have the Keyword Attribute, so the following copyTo() does not overwrite all attributes. Same applies for AnalysisReqHandlerBase. The depreacted AnalysisReqHandler does not has this problem as it does not debug all filters and caches no Tokens. Patch is coming...
        Hide
        rcmuir Robert Muir added a comment -

        I created an issue for the analysis.jsp: SOLR-2473

        the problem is likely to cause a lot of confusion, and as Uwe said we should check the similar AnalysisRequestHandler too

        Show
        rcmuir Robert Muir added a comment - I created an issue for the analysis.jsp: SOLR-2473 the problem is likely to cause a lot of confusion, and as Uwe said we should check the similar AnalysisRequestHandler too
        Hide
        thetaphi Uwe Schindler added a comment -

        I think the stupid Token caching in analysis.jsp is maybe broken. Does it also affect *AnalysisRequestHandler?

        Show
        thetaphi Uwe Schindler added a comment - I think the stupid Token caching in analysis.jsp is maybe broken. Does it also affect *AnalysisRequestHandler?
        Hide
        yseeley@gmail.com Yonik Seeley added a comment -

        clearing should be done by the Tokenizer or other token-producers

        Whew... ok, I didn't realize that pre-dated LUCENE-2901

        So perhaps this is just an analysis.jsp bug, since a query of
        http://localhost:8983/solr/select?q="dontstems hellos"&debugQuery=true
        seems to work fine and produce "dontstems hello"

        Show
        yseeley@gmail.com Yonik Seeley added a comment - clearing should be done by the Tokenizer or other token-producers Whew... ok, I didn't realize that pre-dated LUCENE-2901 So perhaps this is just an analysis.jsp bug, since a query of http://localhost:8983/solr/select?q= "dontstems hellos"&debugQuery=true seems to work fine and produce "dontstems hello"
        Hide
        rcmuir Robert Muir added a comment -

        This is just a bug in analysis.jsp, compare the query debug output "dontstems bees" to the analysis.jsp output of dontstems bees, and you will see what I mean.

        There is nothing wrong with the lucene filter here!

        Show
        rcmuir Robert Muir added a comment - This is just a bug in analysis.jsp, compare the query debug output "dontstems bees" to the analysis.jsp output of dontstems bees, and you will see what I mean. There is nothing wrong with the lucene filter here!
        Hide
        rcmuir Robert Muir added a comment -

        I see the problem with the example config: simply enter "dontstems foo"

        But, we need to figure out:

        1. is it only a bug in analysis.jsp?
        2. if not, who isn't clearing attributes.
        Show
        rcmuir Robert Muir added a comment - I see the problem with the example config: simply enter "dontstems foo" But, we need to figure out: is it only a bug in analysis.jsp? if not, who isn't clearing attributes.
        Hide
        thetaphi Uwe Schindler added a comment -

        Yonik, clearing should be done by the Tokenizer or other token-producers (if a filter inserts Tokens, it also has to clear Attributes). If the Tokenizer does not clear all Attributes using clearAttributes(), it is borken. But not this one.

        Can you post the config of your Tokenizers and Filters or which Analyzer is affected?

        Show
        thetaphi Uwe Schindler added a comment - Yonik, clearing should be done by the Tokenizer or other token-producers (if a filter inserts Tokens, it also has to clear Attributes). If the Tokenizer does not clear all Attributes using clearAttributes(), it is borken. But not this one. Can you post the config of your Tokenizers and Filters or which Analyzer is affected?
        Hide
        rcmuir Robert Muir added a comment -

        all tokenizers should be calling clearAttributes()? where is the problem?

        Show
        rcmuir Robert Muir added a comment - all tokenizers should be calling clearAttributes()? where is the problem?

          People

          • Assignee:
            thetaphi Uwe Schindler
            Reporter:
            yseeley@gmail.com Yonik Seeley
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development