Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-1229

Convert all OpenNLP Enhancement Engines to Configuration Factories

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 0.12.0
    • 0.12.0
    • Enhancement Engines
    • None

    Description

      Currently the OpenNLP Sentence Detection and Tokenizer Enhancement Engines do not support OSGI Configuration Factories. Because of that they do only allow a single instance.

      However this can create problems if one wants to configure multiple Enhancement Chains with different NLP frameworks.

      Here an example

      Chain1:

      • OpenNLP for English, German and Spanish

      Chain2:

      • Stanford NLP for English
      • OpenNLP for German
      • Freeling NLP for Spanish

      As OpenNLP does support all three mentioned languages a user would like to configure the following Engines configurations for OpenNLP:

      1. OpenNLP engines for sentence detection, tokenization, POS tagging and Chunking that include all three languages.
      2. OpenNLP engines that only process German language texts for sentence detection, tokenization, POS tagging and Chunking
      3. RESTful NLP Analysis Engine calling StanfordNLP for English language texts
      4. RESTful NLP Analysis Engine calling Freeling for Spanish language texts

      Chain1 would use the OpenNLP engines configured to process all languages while Chain 2 would use the engine configurations listed under point 2 to 4.

      However as the OpenNLP Tokenizer and Sentence detection engine do not support OSGI Configuration Factories this is currently not possible as only a single Engine instance of those two engines can be configured.

      Because of that English and Spanish Text sent to Chain2 would be processed by two Sentence Detectors and Tokenizers and this results in duplicate Sentence and Token annotations.

      Adding support for OSGI Configuration Factories to all OpenNLP EnhancementEngines will solve this issue. Existing Configurations will be not affected as all engines do already use "ConfigurationPolicy.OPTIONAL" - meaning that a default instance with the default configuration is created automatically.

      This Issues affects both the trunk as well as the 0.12 releasing branch

      Attachments

        Activity

          People

            rwesten Rupert Westenthaler
            rwesten Rupert Westenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: