Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-568

Doccat command line tagger should assume whitespace tokenized input

    XMLWordPrintableJSON

Details

    Description

      The DoccatTool should read the doccat default format from stdin. The default format is whitespace tokenized, but the DoccatTool uses the Simple Tokenizer to tokenize the input text.

      To fix this issue use the Whitespace Tokenizer instead of the Simple Tokenizer.

      Attachments

        Activity

          People

            joern Jörn Kottmann
            joern Jörn Kottmann
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: