Lucene - Core
  1. Lucene - Core
  2. LUCENE-3375

processing a synonym in a token stream will remove the following token from the stream

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.4, 4.0-ALPHA
    • Fix Version/s: 3.4, 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Environment:

      Description

      If you do a phrase search on a field derived from a fieldtype with the synonym filter which includes a synonym, the term following the synonym vanishes after synonym expansion.

      e.g. http://host:port/solr/corename/select/?q=desc:%22xyzzy%20%20bbb%20pot%20of%20gold%22&version=2.2&start=0&rows=10&indent=on&debugQuery=true (bbb is in the default synonyms file, desc is a "text" fieldtype)

      outputs
      ....
      <str name="rawquerystring">desc:"xyzzy bbb pot of gold"</str>
      <str name="querystring">desc:"xyzzy bbb pot of gold"</str>
      <str name="parsedquery">PhraseQuery(desc:"xyzzy bbbb 1 bbbb 2 of gold")</str>
      <str name="parsedquery_toString">desc:"xyzzy bbbb 1 bbbb 2 of gold"</str>
      ....

      You can also see this behavior using the admin console analysis.jsp

      Solr 3.3 behaves properly.

      1. LUCENE-3375.patch
        16 kB
        Robert Muir
      2. LUCENE-3375.patch
        8 kB
        Michael McCandless
      3. LUCENE-3375_test.patch
        2 kB
        Robert Muir
      4. SOLR-2709_test.patch
        2 kB
        Robert Muir

        Activity

        Hide
        Robert Muir added a comment -

        Here's a test case of the bug, thanks for reporting this Simon!

        Show
        Robert Muir added a comment - Here's a test case of the bug, thanks for reporting this Simon!
        Hide
        Robert Muir added a comment -

        updated test that includes an expansion of the case, where 'bbb' maps to 3 words, in this case it nukes 'of' also... so now we can see the general pattern of the bug.

        Show
        Robert Muir added a comment - updated test that includes an expansion of the case, where 'bbb' maps to 3 words, in this case it nukes 'of' also... so now we can see the general pattern of the bug.
        Hide
        Michael McCandless added a comment -

        Nice catch – thanks Simon!

        The attached patch should fix the issue. The problem was we were allowing preserveOrig to apply across all output'd tokens, not just the matched input tokens.

        Show
        Michael McCandless added a comment - Nice catch – thanks Simon! The attached patch should fix the issue. The problem was we were allowing preserveOrig to apply across all output'd tokens, not just the matched input tokens.
        Hide
        Robert Muir added a comment -

        Mike's patch, but i ported the tests from the old synfilter to boost our test coverage a little bit.

        we could still cleanup and improve these tests but it makes me feel better.

        Show
        Robert Muir added a comment - Mike's patch, but i ported the tests from the old synfilter to boost our test coverage a little bit. we could still cleanup and improve these tests but it makes me feel better.
        Hide
        Robert Muir added a comment -

        Thanks again Simon!

        Show
        Robert Muir added a comment - Thanks again Simon!

          People

          • Assignee:
            Unassigned
            Reporter:
            Simon Rosenthal
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development