[LUCENE-192] [PATCH] Allowing '-'/'+' in terms - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: None
Component/s: core/queryparser
Labels:
None
Environment:

Operating System: other
Platform: Other

Bugzilla Id:
27491

Description

I suggest to change the definition of term character in QueryParser.jj
from

<#_TERM_CHAR: ( <_TERM_START_CHAR>	<_ESCAPED_CHAR> ) > to
<#_TERM_CHAR: ( <_TERM_START_CHAR>	<_ESCAPED_CHAR>	"-"	"+" ) >

As a result query parser will read '-' and '+' within words (such as tft-monitor
or Sysh1-1) as one term, which will be tokenized by the used analyzer
and end up in a term query or phrase query depending if it create one ore
more tokens.
So with StandardAnalyzer a query tft-monitor would get a phrase query "tft
monitor" and Sysh1-1 a term query for "Sysh1-1".
Searching tft-monitor as a phrase "tft monitor" is not exact but the best
aproximation possible once you indexed tft-monitor as tokens tft and monitor.
Currently query parser interpret every '-' or '+' as operators, which means
that 'tft-monitor' gets parsed as tft AND NOT monitor, which probably isn't what
the user wanted.
The effect of '-'/'+' not occuring within a word is not changed, so
tft -monitor will still search for 'tft AND NOT monitor'.

All regression tests pass with the change.

I didn't add a patch-file, because I think it's easy to change queryParser.jj by
hand.

Attachments

Activity

People

Assignee:: Lucene Developers

Reporter:: Morus Walter

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 06/Mar/04 16:36

Updated:: 28/Aug/22 11:15

Resolved:: 27/May/06 01:37