Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7315

Flexible "standard" query parser parses on whitespace

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: modules/queryparser
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Copied from LUCENE-2605:

      The queryparser parses input on whitespace, and sends each whitespace separated term to its own independent token stream.
      This breaks the following at query-time, because they can't see across whitespace boundaries:

      n-gram analysis
      shingles
      synonyms (especially multi-word for whitespace-separated languages)
      languages where a 'word' can contain whitespace (e.g. vietnamese)

      Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters will do the same thing at index and querytime, but in many cases they can't. Instead, preferably the queryparser would parse around only real 'operators'.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                steve_rowe Steve Rowe
                Reporter:
                steve_rowe Steve Rowe
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: