Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2068

fix reverseStringFilter for unicode 4.0

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 4.0-ALPHA
    • modules/analysis
    • None
    • New, Patch Available

    Description

      ReverseStringFilter is not aware of supplementary characters: when it reverses it will create unpaired surrogates, which will be replaced by U+FFFD by the indexer (but not at query time).
      The wrong words will conflate to each other, and the right words won't match, basically the whole thing falls apart.

      This patch implements in-place reverse with the algorithm from apache harmony AbstractStringBuilder.reverse0()

      Attachments

        1. LUCENE_2068.patch
          13 kB
          Simon Willnauer
        2. LUCENE_2068.patch
          6 kB
          Simon Willnauer
        3. LUCENE_2068.patch
          6 kB
          Simon Willnauer
        4. LUCENE-2068.patch
          13 kB
          Robert Muir
        5. LUCENE-2068.patch
          5 kB
          Robert Muir

        Issue Links

          Activity

            People

              simonw Simon Willnauer
              rcmuir Robert Muir
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: