Uploaded image for project: 'OpenNLP'
  1. OpenNLP
  2. OPENNLP-993

DocumentCategorizerME does not load tokenizer specified in model manifest

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Won't Fix
    • 1.7.1
    • 1.8.0
    • Doccat
    • None

    Description

      DocumentCategorizerME no longer loads the tokenizer specified in the model manifest. Instead it always uses a WhitespaceTokenizer.

      This appears to due to a change in 1.7.1 where the constructors for the DoccatFactory were modified to create a WhitespaceTokenizer.

      This means the logic in the DoccatFactory.getTokenizer() method does try to load the tokenizer in model's manifest as "tokenizer" is not null when getTokenizer() is first called.

      Attachments

        Activity

          People

            smarthi Suneel Marthi
            rabidgremlin Jonathan Ackerman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: