Description
TextMarker is not applicable in use cases with very large artifacts, e.g., documents with 500k - 1M tokens.
Adapt or exchange the rule language to allow the user to handle such texts:
- reduce the memory profile of TextMarkerBasic inference annotations, make it configurable respectively.
- add the concept of simple rules that match only on a single regular expression for adding annotations without inference annotations (related to
UIMA-2331). - allow the user to skip seeding at the startup of the engine and to apply the seeders on certain annotations within rule inference.
- introduce language concepts that enable the user to split documents into multiple CASs.
Attachments
There are no Sub-Tasks for this issue.