Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
0.12.0
-
None
Description
Currently the OpenNLP Sentence Detection and Tokenizer Enhancement Engines do not support OSGI Configuration Factories. Because of that they do only allow a single instance.
However this can create problems if one wants to configure multiple Enhancement Chains with different NLP frameworks.
Here an example
Chain1:
- OpenNLP for English, German and Spanish
Chain2:
- Stanford NLP for English
- OpenNLP for German
- Freeling NLP for Spanish
As OpenNLP does support all three mentioned languages a user would like to configure the following Engines configurations for OpenNLP:
1. OpenNLP engines for sentence detection, tokenization, POS tagging and Chunking that include all three languages.
2. OpenNLP engines that only process German language texts for sentence detection, tokenization, POS tagging and Chunking
3. RESTful NLP Analysis Engine calling StanfordNLP for English language texts
4. RESTful NLP Analysis Engine calling Freeling for Spanish language texts
Chain1 would use the OpenNLP engines configured to process all languages while Chain 2 would use the engine configurations listed under point 2 to 4.
However as the OpenNLP Tokenizer and Sentence detection engine do not support OSGI Configuration Factories this is currently not possible as only a single Engine instance of those two engines can be configured.
Because of that English and Spanish Text sent to Chain2 would be processed by two Sentence Detectors and Tokenizers and this results in duplicate Sentence and Token annotations.
Adding support for OSGI Configuration Factories to all OpenNLP EnhancementEngines will solve this issue. Existing Configurations will be not affected as all engines do already use "ConfigurationPolicy.OPTIONAL" - meaning that a default instance with the default configuration is created automatically.
This Issues affects both the trunk as well as the 0.12 releasing branch