Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-202

Error on text extraction: java.lang.IndexOutOfBoundsExceptio

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • 1.5.0
    • Parsing
    • None

    Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1565617
      Originally submitted by gagravarr on 2006-09-26 03:30.

      I'm trying to extract text from a pdf file
      (http://www.cifor.cgiar.org/mla/download/publication/mozambique.pdf),
      but I'm getting an IndexOutOfBoundsException on it:

      Exception in thread "main"
      java.lang.IndexOutOfBoundsException: Index: 4, Size: 4
      at
      java.util.ArrayList.RangeCheck(ArrayList.java:546)
      at java.util.ArrayList.get(ArrayList.java:321)
      at
      org.pdfbox.util.operator.Concatenate.process(Concatenate.java:69)
      at
      org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:494)
      at
      org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:207)
      at
      org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:160)
      at
      org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:355)
      at
      org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:268)
      at
      org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)
      at
      org.pdfbox.ExtractText.main(ExtractText.java:237)

      I've tried with 0.7.2, and 0.7.3-dev-20060920, and I
      get the same exception from both versions.

      Nick

      Attachments

        1. mozambique.pdf
          6.41 MB
          Jukka Zitting

        Activity

          People

            adamnichols Adam Nichols
            Anonymous Anonymous
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: