Uploaded image for project: 'Stanbol (Retired)'
  1. Stanbol (Retired)
  2. STANBOL-1231

Add French Treebank+ Tagset to OpenNLP poss tagging engine

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.12.0
    • None
    • None

    Description

      Nicolas Hernandez has made OpenNLP models for french available at [1]. This models seam to use a Tagset published by "Crabb ́e & Candito, 2008" and best described on page 8 of [2]. Information on the main categories can be found at [3].

      To use this models with the OpenNLP Pos Tagging Engine the PosTagSetRegistty should be extended with TagSet mapping for this Tagset.

      NOTE: Users that want to use those models will need to download them from [1], extract the archive. Rename the files to fr-sent.bin, fr-token.bin, fr-pos-maxent.bin, fr-chunker.bin and copy those files to the stanbol datafiles directory (by default under "stanbol/datafiles").

      [1] http://enicolashernandez.blogspot.co.at/2012/12/apache-opennlp-fr-models.html
      [2] http://alpage.inria.fr/statgram/frdep/Publications/crabbecandi-taln2008-final.pdf
      [3] http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-en.php

      Attachments

        Activity

          People

            rwesten Rupert Westenthaler
            rwesten Rupert Westenthaler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: