Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
[paoding-analysis](http://code.google.com/p/paoding/) is a Solr/Lucene tokenizer for Chinese. It represents an alternative to the Smartcn analyzer included in the Solr Analysis Extras.
To allow he use of Paoding for processing Chinese Text one needs the following things:
(1) Extension to the Stanbol Commons Solr module: Basically an OSGI bundle version of the paoding analyzers. As paoding depends heavily on System environments variables and file system paths some adaptions to the initialization are necessary.
(2) LabelTokenizer implementation for the EntityLinkingEngine: This is needed to Tokenizer Labels of Entities
(3) Tokenizer Enhancement Engine: This will retrieve/create the AnalyzedText contentpart, tokenize the text parsed to the Stanbol Enhancer and add Tokens for the detected words to the AnalyzedText.
Attachments
Issue Links
- is related to
-
STANBOL-855 Add basic language support for Chinese
- Closed