Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-2455

Make ordering of getNextAnnotations result configurable

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0TextMarker
    • 2.0.0TextMarker
    • Ruta
    • None

    Description

      Example rule:
      A B C

      {NOT(PARTOF(D))->MARK(D,3)}

      ;

      Example text:
      aText bText cText cMoreText

      where following correspondence between annotations and tokens are held:
      A = aText
      B = bText
      C = cText
      C = cText cMoreText

      Rule results in the following:
      D = cText

      However I expect that:
      D = cText cMoreText

      The reason of actual behaviour is org.apache.uima.textmarker.rule.AnnotationComparator#compare implementation. It returns a shorter annotation before longer. That is why the sequence 'aText bText cText' will be matched and sequence 'aText bText cText cMoreText' will not because it will be considered later and will not pass NOT PARTOF condition.

      I've revealed this after migration to the latest TextMarker sources (from ASF repo). Before we used the one from Sourceforge.net. In the old (sourceforge) version this problem did not arise because TextMarkerBasic could keep only one annotation per Type as 'begin anchor'. Returning to the example this means that 'cText' TextMarkerBasic held only one C annotation as begin anchor.

      In current (rev. 1371274) version TextMarkerBasic keeps a set of begin and end anchors per Type. This is actually a good improvement.
      But I suggest to make ordering of anchored annotations returned by TextMarkerRuleElement#getNextAnnotations(boolean, AnnotationFS, TextMarkerStream) method more controllable.
      E.g., by adding some parameter for TextMarkerEngine or script which will define AnnotationComparator#compare implementation.

      Also returning longer annotations before shorter ones seems to be more compliant to the UIMA default indexing. See http://uima.apache.org/d/uimaj-2.4.0/references.html#ugr.ref.cas.index.built_in_indexes

      Attachments

        Activity

          People

            pkluegl Peter Klügl
            rinaldv Rinat Gareyev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: