[LUCENE-6254] Dictionary-based lemmatizer - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: 5.0
Component/s: modules/analysis
Labels:
- patch

Lucene Fields:

New

Description

The only way to achieve lemmatization today is to use the SynonymFilterFactory. The available stemmers are also inaccurate since they are only following simplistic rules.

A dictionary-based lemmatizer will be more precise because it has the opportunity to know the part of speech. Thus it provides a more precise method to stem words compared to other dictionary-based stemmers such as Hunspell.

This is my effort to develop such a lemmatizer for Apache Lucene. The documentation is temporarily placed here:
http://folk.uio.no/erlendfg/solr/lemmatizer.html

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-6254.patch
18/Feb/15 10:01
32 kB
Erlend Garåsen

Activity

People

Assignee:: Unassigned

Reporter:: Erlend Garåsen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 18/Feb/15 09:55

Updated:: 28/Aug/22 14:25