Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2068

fix reverseStringFilter for unicode 4.0

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      ReverseStringFilter is not aware of supplementary characters: when it reverses it will create unpaired surrogates, which will be replaced by U+FFFD by the indexer (but not at query time).
      The wrong words will conflate to each other, and the right words won't match, basically the whole thing falls apart.

      This patch implements in-place reverse with the algorithm from apache harmony AbstractStringBuilder.reverse0()

        Attachments

        1. LUCENE-2068.patch
          5 kB
          Robert Muir
        2. LUCENE-2068.patch
          13 kB
          Robert Muir
        3. LUCENE_2068.patch
          6 kB
          Simon Willnauer
        4. LUCENE_2068.patch
          6 kB
          Simon Willnauer
        5. LUCENE_2068.patch
          13 kB
          Simon Willnauer

          Issue Links

            Activity

              People

              • Assignee:
                simonw Simon Willnauer
                Reporter:
                rcmuir Robert Muir
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: