Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7533

Classic query parser: autoGeneratePhraseQueries=true doesn't work when splitOnWhitespace=false

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 6.2, 6.2.1, 6.3
    • 6.4, 7.0
    • None
    • None
    • New

    Description

      LUCENE-2605 introduced the classic query parser option to not split on whitespace prior to performing analysis.

      From the javadocs for QueryParser.setAutoGeneratePhraseQueries():

      phrase queries will be automatically generated when the analyzer returns more than one term from whitespace delimited text.

      When splitOnWhitespace=false, the output from analysis can now come from multiple whitespace-separated tokens, which breaks code assumptions when autoGeneratePhraseQueries=true: for this combination of options, it's not appropriate to auto-quote multiple non-overlapping tokens produced by analysis. E.g. simple whitespace tokenization over the query "some words" will produce the token sequence ("some", "words"), and even when autoGeneratePhraseQueries=true, we should not be creating a phrase query here.

      Attachments

        1. LUCENE-7533.patch
          21 kB
          Steven Rowe
        2. LUCENE-7533-disallow-option-combo.patch
          10 kB
          Steven Rowe

        Issue Links

          Activity

            People

              sarowe Steven Rowe
              sarowe Steven Rowe
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: