[OPENNLP-993] DocumentCategorizerME does not load tokenizer specified in model manifest - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Won't Fix
Affects Version/s: 1.7.1
Fix Version/s: 1.8.0
Component/s: Doccat
Labels:
None

Description

DocumentCategorizerME no longer loads the tokenizer specified in the model manifest. Instead it always uses a WhitespaceTokenizer.

This appears to due to a change in 1.7.1 where the constructors for the DoccatFactory were modified to create a WhitespaceTokenizer.

This means the logic in the DoccatFactory.getTokenizer() method does try to load the tokenizer in model's manifest as "tokenizer" is not null when getTokenizer() is first called.

Attachments

Activity

People

Assignee:: Suneel Marthi

Reporter:: Jonathan Ackerman

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Feb/17 19:57

Updated:: 27/Feb/17 21:53

Resolved:: 23/Feb/17 10:06