Uploaded image for project: 'Stanbol'
  1. Stanbol
  2. STANBOL-1422

Add support for ixa-nerc NER models

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.0.0
    • Component/s: None
    • Labels:
      None

      Description

      The ixa-pipe-nec [1] provides good quality Named Entity Recognition models for English, Spanish, Dutch, German and Italian. However to use those models one needs

      • OpenNLP 1.6.0
      • OpenNLP extensions provided by the ixa-pipe-nec module.

      OpenNLP 1.6.0 is not yet released so we will need to go with a SNAPSHOT version for now. The ixa-pipe-nec module does not support OSGI. So we will need to embed the required classes into a bundle and provide a bundle activator that registers the extensions as OSGI services (with the metadata expected by OpenNLP).

      NOTE: This issue will only cover extensions to Apache Stanbol so that one cane use the provided models. To use the models Users will need to download the ~700Mbyte archive linked on [1] get the OpenNLP modles (*.bin files) and put them into datafiles folder of Apache Stanbol.

      The models use PER, ORG, LOC and MISC as types. So using a configuration for the CustomNERModelEnhancementEngine should do the trick:

      # Configuration of org.apache.stanbol.enhancer.engines.opennlp.impl.CustomNERModelEnhancementEngine-ixa_nec.config
      stanbol.engines.opennlp-ner.typeMappings=["PER\ >\ http://dbpedia.org/ontology/Person","ORG\ >\ http://dbpedia.org/ontology/Organisation","LOC\ >\ http://dbpedia.org/ontology/Place","MISC\ >\ skos:Concept"]
      stanbol.enhancer.engine.name="ixa-nerc"
      stanbol.engines.opennlp-ner.nameFinderModels=["de-clusters-dictlbj-conll03.bin","en-91-18-4-class-muc7-conll03-ontonotes-4.0.bin","es-clusters-dictlbj-conll02.bin","it-clusters-evalita09.bin","nl-clusters-dictlbj-conll02.bin","eu-clusters-egunkaria.bin"]
      

      The names of the OpenNLP model files are the values of the stanbol.engines.opennlp-ner.nameFinderModels property. You will find those files in the NERC-Models 1.5.0 file. See the documentation on [1] for more details and other options.

      [1] https://github.com/ixa-ehu/ixa-pipe-nerc/

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rwesten Rupert Westenthaler
                Reporter:
                rwesten Rupert Westenthaler
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: