Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
future enhancement
-
None
Description
The ptb tokenizer currently in use by ctakes has some inconsistencies. See https://issues.apache.org/jira/browse/CTAKES-371 It also does not seem to incorporate some of the clinical rules set out in http://clear.colorado.edu/compsem/documents/treebank_guidelines.pdf
Some major refactoring is also in order ... as are numerous test cases.