[UIMA-2397] TextMarker: Improve overall functionality in use cases with very large artifacts - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0TextMarker
Fix Version/s: 2.0.1ruta
Component/s: Ruta
Labels:
None

Description

TextMarker is not applicable in use cases with very large artifacts, e.g., documents with 500k - 1M tokens.
Adapt or exchange the rule language to allow the user to handle such texts:

reduce the memory profile of TextMarkerBasic inference annotations, make it configurable respectively.
add the concept of simple rules that match only on a single regular expression for adding annotations without inference annotations (related to ~~UIMA-2331~~).
allow the user to skip seeding at the startup of the engine and to apply the seeders on certain annotations within rule inference.
introduce language concepts that enable the user to split documents into multiple CASs.

Attachments

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Peter Klügl

Reporter:: Peter Klügl

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 04/May/12 12:00

Updated:: 06/May/13 12:24

Resolved:: 21/Mar/13 13:45