Uploaded image for project: 'UIMA'
  1. UIMA
  2. UIMA-3075

Unambiguous non-strict subiterator may return annotations outside the given annotation's range

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 2.4.0SDK
    • None
    • None
    • None

    Description

      REPRO: using a tokenizer that matches on "[^ ]" on "aaa bbb ccc ddd" I get four token annotations

      "aaa" 0-3
      "bbb" 4-7
      "ccc" 8-11
      "ddd" 12-15

      I then iterate over the token annotations while printing the covered text, begin and end, make an unambiguous non-strict subiterator, and iterate over the subiterations printing out their covered text, begin and end all indented.

      Iterator<Annotation> iter = jcas.getAnnotationIndex(Token.type).iterator();
      while (iter.hasNext()) {
      Annotation a = iter.next();
      System.out.println("\"" + a.getCoveredText() + "\"" + " [" + a.getBegin() + ", " + a.getEnd() + ")");
      Iterator<Annotation> featIter = jcas.getAnnotationIndex().subiterator(a, false, false);
      while (featIter.hasNext())

      { Annotation b = featIter.next(); System.out.println("\t\"" + b.getCoveredText() + "\"" + " [" + b.getBegin() + ", " + b.getEnd() + ")"); }

      }

      The output is
      "aaa" [0, 3)
      "bbb" [4, 7)
      "bbb" [4, 7)
      "ccc" [8, 11)
      "ccc" [8, 11)
      "ddd" [12, 15)
      "ddd" [12, 15)

      I think this can be fixed by adding an extra check at Subiterator.java ln: 127
      NOW
      while (it.isValid() && ((start > annot.getBegin()) || (strict && annot.getEnd() > end)))

      { it.moveToNext(); }
      POSSIBLE FIX
      while (it.isValid() && ((start > annot.getBegin() && annot.getBegin() <= end) || (strict && annot.getEnd() > end))) { it.moveToNext(); }

      Attachments

        Issue Links

          Activity

            People

              rec Richard Eckart de Castilho
              alnith Alexander N Thomas
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: