Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11968

Multi-words query time synonyms

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 6.6.2, 8.0
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      Centos 7.x

      Description

      I am trying multi words query time synonyms with Solr 6.6.2 and SynonymGraphFilterFactory filter as explain in this article
      https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
       
      My field type is :

      <fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100">
           <analyzer type="index">
             <tokenizer class="solr.StandardTokenizerFactory"/>
             <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
                   articles="lang/contractions_fr.txt"/>
             <filter class="solr.LowerCaseFilterFactory"/>
             <filter class="solr.ASCIIFoldingFilterFactory"/>
             <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
             <filter class="solr.FrenchMinimalStemFilterFactory"/>
           </analyzer>
           <analyzer type="query">
             <tokenizer class="solr.StandardTokenizerFactory"/>
             <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
                   articles="lang/contractions_fr.txt"/>
             <filter class="solr.LowerCaseFilterFactory"/>
             <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
                   ignoreCase="true" expand="true"/>
             <filter class="solr.ASCIIFoldingFilterFactory"/>
             <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
             <filter class="solr.FrenchMinimalStemFilterFactory"/>
           </analyzer>
         </fieldType>

       
      synonyms.txt contains the line :

      om, olympique de marseille

       
      stopwords.txt contains the word 

      de

       
      The order of words in my query has an impact on the generated query in edismax

      q={!edismax qf='name_text_gp' v=$qq}
       &sow=false
       &qq=...

      with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the synonyms expansion. It is working as expected.

      "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot) name_text_gp:om))",
       "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot)))",

      with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the same generated query 

      "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
       "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",

      I don't understand these generated queries. The first one looks like the synonym expansion is ignored, but the second one shows it is not ignored and only the synonym term is used.
       
      When I test the analisys for the field type the synonyms are correctly expanded for both expressions

      om maillot  
       maillot om
       olympique de marseille maillot
       maillot olympique de marseille

      resulting outputs always include the following terms (obvioulsly not always in the same order)

      olympiqu om marseil maillot 

       
      So, i suspect an issue with edismax query parser.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sarowe Steve Rowe
                Reporter:
                dbejean Dominique Béjean
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: