Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-568

Doccat command line tagger should assume whitespace tokenized input

    XMLWordPrintableJSON

    Details

      Description

      The DoccatTool should read the doccat default format from stdin. The default format is whitespace tokenized, but the DoccatTool uses the Simple Tokenizer to tokenize the input text.

      To fix this issue use the Whitespace Tokenizer instead of the Simple Tokenizer.

        Attachments

          Activity

            People

            • Assignee:
              joern Jörn Kottmann
              Reporter:
              joern Jörn Kottmann
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: