Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, Trunk
    • Component/s: core/queryparser
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The queryparser parses input on whitespace, and sends each whitespace separated term to its own independent token stream.

      This breaks the following at query-time, because they can't see across whitespace boundaries:

      • n-gram analysis
      • shingles
      • synonyms (especially multi-word for whitespace-separated languages)
      • languages where a 'word' can contain whitespace (e.g. vietnamese)

      Its also rather unexpected, as users think their charfilters/tokenizers/tokenfilters will do the same thing at index and querytime, but
      in many cases they can't. Instead, preferably the queryparser would parse around only real 'operators'.

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Unassigned
              Reporter:
              Robert Muir
            • Votes:
              23 Vote for this issue
              Watchers:
              42 Start watching this issue

              Dates

              • Created:
                Updated:

                Development