Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-287

Extend POS Tagger documentation with more information about the tag dictionary

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Documentation, POS Tagger
    • Labels:
      None

      Description

      Extend the POS Tagger tag dictionary section as described in the documentation.

      1. TaggerDictionaryTest.java
        3 kB
        Łukasz Dróżdż
      2. en-pos.train
        0.8 kB
        Łukasz Dróżdż
      3. dictionary.xml
        3 kB
        Łukasz Dróżdż

        Activity

        Hide
        ldrozdz Łukasz Dróżdż added a comment -

        Sure. Let me write it up and get back to you.

        Show
        ldrozdz Łukasz Dróżdż added a comment - Sure. Let me write it up and get back to you.
        Hide
        joern Joern Kottmann added a comment -

        We use docbook for the opennlp documentation. Please consider sending us a patch for the Pos Tagger chapter.

        Show
        joern Joern Kottmann added a comment - We use docbook for the opennlp documentation. Please consider sending us a patch for the Pos Tagger chapter.
        Hide
        ldrozdz Łukasz Dróżdż added a comment - - edited

        Hi,

        Here's my attempt at providing a sample POS dictionary file, as well as test code for programmatic usage, for both reading in and writing back the dictionary and using it to training a POS tagger. See the attached files for details.

        The XML structure of a POS dictionary is:

        <?xml version="1.0" encoding="UTF-8"?>
        <dictionary>
        <entry tags="tag1 tag2">
        <token>token1</token>
        </entry>
        <entry tags="tag1">
        <token>token2</token>
        </entry>
        </dictionary>

        Hope that helps.

        Show
        ldrozdz Łukasz Dróżdż added a comment - - edited Hi, Here's my attempt at providing a sample POS dictionary file, as well as test code for programmatic usage, for both reading in and writing back the dictionary and using it to training a POS tagger. See the attached files for details. The XML structure of a POS dictionary is: <?xml version="1.0" encoding="UTF-8"?> <dictionary> <entry tags="tag1 tag2"> <token>token1</token> </entry> <entry tags="tag1"> <token>token2</token> </entry> </dictionary> Hope that helps.
        Hide
        ldrozdz Łukasz Dróżdż added a comment - - edited

        Code examples and sample files for reading and serializing the POS dictionary and training the model with the dictionary.

        Show
        ldrozdz Łukasz Dróżdż added a comment - - edited Code examples and sample files for reading and serializing the POS dictionary and training the model with the dictionary.

          People

          • Assignee:
            Unassigned
            Reporter:
            joern Joern Kottmann
          • Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development