Uploaded image for project: 'Lucy'
  1. Lucy
  2. LUCY-191

Unicode normalization

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.3.0 (incubating)
    • Analysis

    Description

      As discussed on the mailing list, it would be nice to have Unicode normalization, Unicode case folding and stripping of accents as part of the analyzer chain. With the help of utf8proc this can be done in one pass. So I proposed a new analyzer Lucy::Analyzer::Normalizer with an interface described here:

      http://mail-archives.apache.org/mod_mbox/incubator-lucy-dev/201111.mbox/%3C4EC43816.1070107%40aevum.de%3E

      Attachments

        1. LUCY-191-normalizer-v2.patch
          20 kB
          Nikolas Wellnhofer
        2. LUCY-191-normalizer-v1-v2.interdiff
          5 kB
          Marvin Humphrey
        3. LUCY-191-normalizer.patch
          19 kB
          Nikolas Wellnhofer

        Activity

          People

            nwellnhof Nikolas Wellnhofer
            nwellnhof Nikolas Wellnhofer
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: