[LUCENE-2605] queryparser parses on whitespace - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.2
Component/s: core/queryparser
Labels:
None

Description

The queryparser parses input on whitespace, and sends each whitespace separated term to its own independent token stream.

This breaks the following at query-time, because they can't see across whitespace boundaries:

n-gram analysis
shingles
synonyms (especially multi-word for whitespace-separated languages)
languages where a 'word' can contain whitespace (e.g. vietnamese)

Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters will do the same thing at index and querytime, but
in many cases they can't. Instead, preferably the queryparser would parse around only real 'operators'.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-2605.patch
11/May/16 03:08
12 kB
Steven Rowe
LUCENE-2605.patch
13/May/16 04:19
21 kB
Steven Rowe
LUCENE-2605.patch
04/Jun/16 00:37
48 kB
Steven Rowe
LUCENE-2605.patch
24/Jun/16 16:25
44 kB
Steven Rowe
LUCENE-2605.patch
28/Jun/16 00:04
57 kB
Steven Rowe
LUCENE-2605.patch
01/Jul/16 01:39
86 kB
Steven Rowe
LUCENE-2605-dont-split-by-default.patch
05/Jul/16 22:58
5 kB
Steven Rowe

Issue Links

breaks

LUCENE-7472 MultiFieldQueryParser.getFieldQuery() drops queries that are neither BooleanQuery nor TermQuery

Closed

is related to

LUCENE-7315 Flexible "standard" query parser parses on whitespace

Open

SOLR-9185 Solr's edismax and "Lucene"/standard query parsers should optionally not split on whitespace before sending terms to analysis

Closed

SOLR-4381 Query-time multi-word synonym expansion

Closed

relates to

SOLR-5379 Query-time multi-word synonym expansion

Closed

Activity

People

Assignee:: Steven Rowe

Reporter:: Robert Muir

Votes:: 28 Vote for this issue

Watchers:: 45 Start watching this issue

Dates

Created:: 17/Aug/10 03:30

Updated:: 28/Aug/22 12:31

Resolved:: 05/Jul/16 22:59