Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7533

Classic query parser: autoGeneratePhraseQueries=true doesn't work when splitOnWhitespace=false

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.2, 6.2.1, 6.3
    • Fix Version/s: 6.4, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      LUCENE-2605 introduced the classic query parser option to not split on whitespace prior to performing analysis.

      From the javadocs for QueryParser.setAutoGeneratePhraseQueries():

      phrase queries will be automatically generated when the analyzer returns more than one term from whitespace delimited text.

      When splitOnWhitespace=false, the output from analysis can now come from multiple whitespace-separated tokens, which breaks code assumptions when autoGeneratePhraseQueries=true: for this combination of options, it's not appropriate to auto-quote multiple non-overlapping tokens produced by analysis. E.g. simple whitespace tokenization over the query "some words" will produce the token sequence ("some", "words"), and even when autoGeneratePhraseQueries=true, we should not be creating a phrase query here.

        Attachments

        1. LUCENE-7533.patch
          21 kB
          Steve Rowe
        2. LUCENE-7533-disallow-option-combo.patch
          10 kB
          Steve Rowe

          Issue Links

            Activity

              People

              • Assignee:
                steve_rowe Steve Rowe
                Reporter:
                steve_rowe Steve Rowe
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: