Lucene - Core
  1. Lucene - Core
  2. LUCENE-2068

fix reverseStringFilter for unicode 4.0

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      ReverseStringFilter is not aware of supplementary characters: when it reverses it will create unpaired surrogates, which will be replaced by U+FFFD by the indexer (but not at query time).
      The wrong words will conflate to each other, and the right words won't match, basically the whole thing falls apart.

      This patch implements in-place reverse with the algorithm from apache harmony AbstractStringBuilder.reverse0()

      1. LUCENE_2068.patch
        13 kB
        Simon Willnauer
      2. LUCENE_2068.patch
        6 kB
        Simon Willnauer
      3. LUCENE_2068.patch
        6 kB
        Simon Willnauer
      4. LUCENE-2068.patch
        13 kB
        Robert Muir
      5. LUCENE-2068.patch
        5 kB
        Robert Muir

        Issue Links

          Activity

          No work has yet been logged on this issue.

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development