Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-2397

TextMarker: Improve overall functionality in use cases with very large artifacts

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0TextMarker
    • 2.0.1ruta
    • Ruta
    • None

    Description

      TextMarker is not applicable in use cases with very large artifacts, e.g., documents with 500k - 1M tokens.
      Adapt or exchange the rule language to allow the user to handle such texts:

      • reduce the memory profile of TextMarkerBasic inference annotations, make it configurable respectively.
      • add the concept of simple rules that match only on a single regular expression for adding annotations without inference annotations (related to UIMA-2331).
      • allow the user to skip seeding at the startup of the engine and to apply the seeders on certain annotations within rule inference.
      • introduce language concepts that enable the user to split documents into multiple CASs.

      Attachments

        There are no Sub-Tasks for this issue.

        Activity

          People

            pkluegl Peter Klügl
            pkluegl Peter Klügl
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: