Description
From https://github.com/apache/opennlp/pull/516#issuecomment-1455015772
At the moment our tests verify that the tokenizer objects are created correctly (i.e. tests getters and setters, constructor, etc.), without verifying the actual behavior when used in conjunction with other classes (factory, tokenizer, trainers, etc).
It would be best to test the patterns used in the factories for different languages with some interesting sample data (maybe something from project gutenberg, open source news sites, etc.).
Attachments
Issue Links
- relates to
-
OPENNLP-1474 Create tokenizer factories for other langs (Spanish, Italian, ...)
- Closed