Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-1075

Add support to train the sentence detector and tokenizer on the UD corpus

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.8.1
    • Formats
    • None

    Description

      The UD corpus contains the original text in a comment field and that can be used to produce training data for the tokenizer and sentence detector.

      Attachments

        Issue Links

          Activity

            People

              joern Jörn Kottmann
              joern Jörn Kottmann
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: