-
Type:
New Feature
-
Status: Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: 5.0
-
Component/s: modules/analysis
-
Labels:
-
Lucene Fields:New
The only way to achieve lemmatization today is to use the SynonymFilterFactory. The available stemmers are also inaccurate since they are only following simplistic rules.
A dictionary-based lemmatizer will be more precise because it has the opportunity to know the part of speech. Thus it provides a more precise method to stem words compared to other dictionary-based stemmers such as Hunspell.
This is my effort to develop such a lemmatizer for Apache Lucene. The documentation is temporarily placed here:
http://folk.uio.no/erlendfg/solr/lemmatizer.html