Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-3127

Dismax to honor the KeywordTokenizerFactory when querying with multi word strings

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Not A Problem
    • 3.5
    • None
    • query parsers

    Description

      When using the KeywordTokenizerFactory with a multi word search string, the dismax query created is not very useful. Although the query analzyer doesn't tokenize the search input, each word of the input is include in the search.

      e.g. if searching for 'chicken stock' the dismax query created would be:
      +(DisjunctionMaxQuery((ingredient_synonyms:chicken^0.6)~0.01) DisjunctionMaxQuery((ingredient_synonyms:stock^0.6)~0.01)) DisjunctionMaxQuery((ingredient_synonyms:chicken stock^0.6)~0.01)

      Note that although the query analyzer does not tokenize the term 'chicken stock' into 'chicken' and 'stock', they are still included and required in the search term.
      I think the query created should be just:
      DisjunctionMaxQuery((ingredient_synonyms:chicken stock)~0.01)
      (or at least not have the individual terms as should match, not must match so you could configure with MM.

      Example field type:
      <fieldType name="keyword_test" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false">
      <analyzer type="index">
      <tokenizer class="solr.KeywordTokenizerFactory" />
      </analyzer>
      <analyzer type="query">
      <tokenizer class="solr.KeywordTokenizerFactory" />
      </analyzer>
      </fieldType>

      Attachments

        Activity

          People

            Unassigned Unassigned
            ztsmith Zac Smith
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: