Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-192

[PATCH] Allowing '-'/'+' in terms

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • None
    • core/queryparser
    • None
    • Operating System: other
      Platform: Other

    • 27491

    Description

      I suggest to change the definition of term character in QueryParser.jj
      from

      <#_TERM_CHAR: ( <_TERM_START_CHAR> <_ESCAPED_CHAR> ) >
      to
      <#_TERM_CHAR: ( <_TERM_START_CHAR> <_ESCAPED_CHAR> "-" "+" ) >

      As a result query parser will read '-' and '+' within words (such as tft-monitor
      or Sysh1-1) as one term, which will be tokenized by the used analyzer
      and end up in a term query or phrase query depending if it create one ore
      more tokens.
      So with StandardAnalyzer a query tft-monitor would get a phrase query "tft
      monitor" and Sysh1-1 a term query for "Sysh1-1".
      Searching tft-monitor as a phrase "tft monitor" is not exact but the best
      aproximation possible once you indexed tft-monitor as tokens tft and monitor.
      Currently query parser interpret every '-' or '+' as operators, which means
      that 'tft-monitor' gets parsed as tft AND NOT monitor, which probably isn't what
      the user wanted.
      The effect of '-'/'+' not occuring within a word is not changed, so
      tft -monitor will still search for 'tft AND NOT monitor'.

      All regression tests pass with the change.

      I didn't add a patch-file, because I think it's easy to change queryParser.jj by
      hand.

      Attachments

        Activity

          People

            java-dev@lucene.apache.org Lucene Developers
            morus.walter@gmx.de Morus Walter
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: