Tika
  1. Tika
  2. TIKA-733

[PATCH] RTF TextExtractor processGroupEnd() NoSuchElementException

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0
    • Fix Version/s: 1.0
    • Component/s: parser
    • Labels:

      Description

      Parsing some RTF documents attempt to perform a removeLast() on the groupStates() list when the list is empty. Added a check to not perform the logic when the list is empty, thus causing the restore group state to not be performed. Text extraction now completes without further down-stream errors.

      Unable to include sample file due to sensitive nature of file contents.

      StackTrace (TIKA-0.9)

      Caused by: java.util.NoSuchElementException
      at java.util.LinkedList.remove(LinkedList.java:788)
      at java.util.LinkedList.removeLast(LinkedList.java:144)
      at org.apache.tika.parser.rtf.TextExtractor.processGroupEnd(TextExtractor.java:1010)
      at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:352)
      at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:53)
      at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      ... 45 more

        Activity

        Jeremy Anderson created issue -
        Jeremy Anderson made changes -
        Field Original Value New Value
        Attachment TIKA-733-rtf_TextExtractor_processGroupEnd-NoSuchElementException.patch [ 12496831 ]
        Michael McCandless made changes -
        Assignee Michael McCandless [ mikemccand ]
        Michael McCandless made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Jeremy Anderson
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development