Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11022

SynonymGraphFilterFactory proximity search error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 6.6
    • Fix Version/s: None
    • Component/s: query parsers
    • Labels:

      Description

      There seems to be an issue when doing proximity searches that include terms that have multi-word synonyms.

      Example:
      consider there's is configured in synonyms.txt
      (
      grand mother, grandmother
      grandfather, granddad
      )
      and there's an indexed field with: (My mother and my grandmother went...)

      Proximity search with: ("mother grandmother"~8)
      won't return the file, while ("father grandfather"~8) does return the analogous file.

      I am not a developer of Solr, so pardon if I am wrong, but I ran it with debug=query and saw that when proximity searches are done with multi-term synonyms, the called function is spanNearQuery:
      "parsedquery":"SpanNearQuery(spanNear([laudo:mother,
      spanOr([laudo:grand mother, laudo:grandmother])],0, true))"

      while proximity searches with one-term synonyms are executed with:
      "MultiPhraseQuery(laudo:\"father (grandfather granddad)\"~10)"

      Note that the SpanNearQuery is called with a slope parameter of 0, no matter what is passed after the tilde. So if I search the exact phrase it does match.

      Here is my field-type, just in case:
      <fieldType name="text_pt_synonyms_ascii_minimal_lightStem" class="solr.TextField" positionIncrementGap="100">

      <analyzer type="index">

      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_pt.txt" ignoreCase="true"/>
      <filter class="solr.PortugueseLightStemFilterFactory"/>
      </analyzer>

      <analyzer type="query">

      <tokenizer class="solr.StandardTokenizerFactory"/><filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_pt.txt" ignoreCase="true"/><filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms_radex.txt"/>
      <filter class="solr.PortugueseLightStemFilterFactory"/>
      </analyzer>

      </fieldType>

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              diogoedl Diogo Guilherme Leão Edelmuth
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: