Details
Description
With the most recent Solr/Lucene versions the Kuromoji Analyzer for Japanese was added. This module will allow to
- index and search Entities with Japanese language labels and texts
- Tokenize Japanese Text
- POS tagging of Japanese Text
- NER for Persons, Organizations and Places
- Lemmatization
- Correct Label Tokenization required for linking Japanese labels of Entities
This will required three modules:
- extension to the commons.solr.core module that provide the Kuromoji Analyzer as Bundle
- NLP processing Engine
- LabelTokenizer implementation
In addition an own bundlelist that includes those three modules. This Bundlelist should be added by default to the Full Stanbol Launcher.