Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7315

Flexible "standard" query parser parses on whitespace

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • modules/queryparser
    • None
    • New

    Description

      Copied from LUCENE-2605:

      The queryparser parses input on whitespace, and sends each whitespace separated term to its own independent token stream.
      This breaks the following at query-time, because they can't see across whitespace boundaries:

      n-gram analysis
      shingles
      synonyms (especially multi-word for whitespace-separated languages)
      languages where a 'word' can contain whitespace (e.g. vietnamese)

      Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters will do the same thing at index and querytime, but in many cases they can't. Instead, preferably the queryparser would parse around only real 'operators'.

      Attachments

        1. LUCENE-7315.patch
          104 kB
          Steven Rowe

        Issue Links

          Activity

            People

              sarowe Steven Rowe
              sarowe Steven Rowe
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: