Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7698

CommonGramsQueryFilter in the query analyzer chain breaks phrase queries

    Details

    • Lucene Fields:
      New

      Description

      (Please pardon me if the project or component are wrong!)

      CommonGramsQueryFilter breaks phrase queries. The behavior also seems to change with addition or removal of adjacent terms.

      Steps to reproduce:

      1.) Download and extract Solr (in my test case version 6.4.1) somewhere.
      2.) Modify server/solr/configsets/sample_techproducts_configs/conf/managed-schema and modify text_general fieldType by adding CommonGrams(Query)Filter before stopWordFilter:

      <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.CommonGramsFilterFactory" ignoreCase="true" words="stopwords.txt" />
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
      <!-- in this example, we will only use synonyms at query time
      <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
      -->
      <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.CommonGramsQueryFilterFactory" ignoreCase="true" words="stopwords.txt"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      </fieldType>

      3.) Add "with" to server/solr/configsets/sample_techproducts_configs/conf/stopwords.txt and make sure the file has correct line endings (extracted from Solr zip it seems to contain DOS/Windows lien endings which may break things).

      4.) Run the techproducts example with "bin/solr -e techproducts"

      5.) Browse to <http://localhost:8983/solr/techproducts/select?q=%22iPod%20with%20Video%22&debugQuery=true>

      6.) Observe that parsedquery in the debug output is empty

      7.) Browse to <http://localhost:8983/solr/techproducts/select?q=%22Apple%2060%20GB%20iPod%20with%20Video%20Playback%20Black%22&debugQuery=true>

      8.) Observe that parsedquery contains ipod_with as expected but not with_video.

        Attachments

        1. LUCENE-7698.patch
          6 kB
          Michael McCandless

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                emaijala Ere Maijala
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: