Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-1390

add ASCIIFoldingFilter and deprecate ISOLatin1AccentFilter

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: modules/analysis
    • Labels:
      None
    • Environment:

      any

    • Lucene Fields:
      New, Patch Available

      Description

      The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set.
      It does what it does and there is no bug with it.

      It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks.
      See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block
      See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block

      That way, all languages using roman characters are covered.
      A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated.

        Attachments

        1. ASCIIFoldingFilter.patch
          204 kB
          Andi Vajda
        2. ASCIIFoldingFilter.patch
          207 kB
          Steve Rowe
        3. ASCIIFoldingFilter.patch
          207 kB
          Steve Rowe

          Issue Links

            Activity

              People

              • Assignee:
                markrmiller@gmail.com Mark Miller
                Reporter:
                vajda Andi Vajda
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: