Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Formats
    • Labels:

      Description

      Add format support for the MASC corpus. The corpus contains annotations for most of the components in OpenNLP and would be a great source of freely available training data for testing.

      The corpus can be found here:
      http://www.anc.org/MASC/About.html#format

        Activity

        Hide
        jds John Stewart added a comment -

        Is it worth investing some time into this corpus? It seems small but quite complete.

        Show
        jds John Stewart added a comment - Is it worth investing some time into this corpus? It seems small but quite complete.
        Hide
        joern Joern Kottmann added a comment -

        It says there is no license, this probably will allow us to get it easily somewhere placed inside the OpenNLP repository, and this would be really nice in order to have unit tests using it. This alone would make it worth the effort to get support for it.

        Show
        joern Joern Kottmann added a comment - It says there is no license, this probably will allow us to get it easily somewhere placed inside the OpenNLP repository, and this would be really nice in order to have unit tests using it. This alone would make it worth the effort to get support for it.

          People

          • Assignee:
            Unassigned
            Reporter:
            joern Joern Kottmann
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development