[LUCY-191] Unicode normalization - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.3.0 (incubating)
Component/s: Analysis
Labels:
- patch

Description

As discussed on the mailing list, it would be nice to have Unicode normalization, Unicode case folding and stripping of accents as part of the analyzer chain. With the help of utf8proc this can be done in one pass. So I proposed a new analyzer Lucy::Analyzer::Normalizer with an interface described here:

http://mail-archives.apache.org/mod_mbox/incubator-lucy-dev/201111.mbox/%3C4EC43816.1070107%40aevum.de%3E

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCY-191-normalizer.patch
19/Nov/11 17:08
19 kB
Nikolas Wellnhofer
LUCY-191-normalizer-v1-v2.interdiff
30/Nov/11 02:25
5 kB
Marvin Humphrey
LUCY-191-normalizer-v2.patch
22/Nov/11 19:40
20 kB
Nikolas Wellnhofer

Activity

People

Assignee:: Nikolas Wellnhofer

Reporter:: Nikolas Wellnhofer

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 19/Nov/11 17:04

Updated:: 13/Dec/11 00:41

Resolved:: 13/Dec/11 00:41