[LUCENE-7315] Flexible "standard" query parser parses on whitespace - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: modules/queryparser
Labels:
None

Lucene Fields:

New

Description

Copied from ~~LUCENE-2605~~:

The queryparser parses input on whitespace, and sends each whitespace separated term to its own independent token stream.
This breaks the following at query-time, because they can't see across whitespace boundaries:

n-gram analysis
shingles
synonyms (especially multi-word for whitespace-separated languages)
languages where a 'word' can contain whitespace (e.g. vietnamese)

Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters will do the same thing at index and querytime, but in many cases they can't. Instead, preferably the queryparser would parse around only real 'operators'.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-7315.patch
20/Jul/16 17:05
104 kB
Steven Rowe

Issue Links

relates to

LUCENE-2605 queryparser parses on whitespace

Closed

Activity

People

Assignee:: Steven Rowe

Reporter:: Steven Rowe

Votes:: 2 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/Jun/16 00:44

Updated:: 28/Aug/22 14:59