Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The "Smart" Simplified Chinese toolkit in lucene/analysis/smartcn does not work in some edge cases. It fails to split certain words which were not part of the dictionary or training corpus.
This patch supplies a bigramming class to handle these occasional mistakes. The algorithm creates bigrams out of all "words" longer than two ideograms.
Attachments
Attachments
Issue Links
- depends upon
-
SOLR-3623 inconsistent treatment of lucene jars & third-party deps in analysis-extras & uima (in war and in lucene-libs)
- Closed