PDFBox
  1. PDFBox
  2. PDFBOX-992

IndexOutOfBoundsException: while parsing few pdf's

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0
    • Fix Version/s: 1.8.0
    • Component/s: Text extraction
    • Environment:
      Windows XP, RAD 7.5, Websphere 6.1, pdfbox-app-1.5.0.jar, fontbox-1.5.0.jar

      Description

      Hi Team, The text extraction works fine with most pdf's but it failed for couple of them with the below error: The pdf can be found here http://cid-a3aa7f7d9888874d.office.live.com/self.aspx/Public/getting%5E_started%5E_with%5E_Flex3.pdf . Let me know if this is a bug or an issue with the pdf.

      java.lang.IndexOutOfBoundsException: Index: 2,Size: 2
      at java.util.SubList.rangeCheck(AbstractList.java:864)
      at java.util.SubList.get(AbstractList.java:737)
      at org.apache.fontbox.cff.CharStringConverter.drawCurve(CharStringConverter.java:415)
      at org.apache.fontbox.cff.CharStringConverter.handleType2Command(CharStringConverter.java:277)
      at org.apache.fontbox.cff.CharStringConverter.handleCommand(CharStringConverter.java:81)
      at org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:53)
      at org.apache.fontbox.cff.CharStringConverter.handleType2Command(CharStringConverter.java:307)
      at org.apache.fontbox.cff.CharStringConverter.handleCommand(CharStringConverter.java:81)
      at org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:53)
      at org.apache.fontbox.cff.CharStringConverter.convert(CharStringConverter.java:64)
      at org.apache.fontbox.cff.CFFFont$Mapping.toType1Sequence(CFFFont.java:374)
      at org.apache.fontbox.cff.AFMFormatter.renderFont(AFMFormatter.java:126)
      at org.apache.fontbox.cff.AFMFormatter.printFontMetrics(AFMFormatter.java:64)
      at org.apache.fontbox.cff.AFMFormatter.printFont(AFMFormatter.java:57)
      at org.apache.fontbox.cff.AFMFormatter.format(AFMFormatter.java:50)

        Issue Links

          Activity

          Hide
          Markus Horehled added a comment -

          Got the same Problem in Win7, Tomcat 6.0.26 and pdfbox-app-1.5.0.jar, fontbox-1.5.0.jar

          Show
          Markus Horehled added a comment - Got the same Problem in Win7, Tomcat 6.0.26 and pdfbox-app-1.5.0.jar, fontbox-1.5.0.jar
          Hide
          Lars Torunski added a comment -

          I'm getting IllegalArgumentException und IndexOutOfBoundsException during text extraction

          java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-2)
          at java.util.SubList.<init>(AbstractList.java:604)
          at java.util.RandomAccessSubList.<init>(AbstractList.java:758)
          at java.util.RandomAccessSubList.subList(AbstractList.java:762)
          at org.apache.fontbox.cff.CharStringConverter.handleType2Command(CharStringConverter.java:259)
          at org.apache.fontbox.cff.CharStringConverter.handleCommand(CharStringConverter.java:81)
          at org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:53)
          at org.apache.fontbox.cff.CharStringConverter.convert(CharStringConverter.java:64)
          at org.apache.fontbox.cff.CFFFont$Mapping.toType1Sequence(CFFFont.java:374)
          at org.apache.fontbox.cff.AFMFormatter.renderFont(AFMFormatter.java:126)
          at org.apache.fontbox.cff.AFMFormatter.printFontMetrics(AFMFormatter.java:64)
          at org.apache.fontbox.cff.AFMFormatter.printFont(AFMFormatter.java:57)
          at org.apache.fontbox.cff.AFMFormatter.format(AFMFormatter.java:50)
          at org.apache.pdfbox.pdmodel.font.PDType1CFont.prepareFontMetric(PDType1CFont.java:502)
          at org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:381)
          at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:104)
          at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:162)
          at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
          at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
          at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
          at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)

          java.lang.IndexOutOfBoundsException: Index: 2,Size: 2
          at java.util.SubList.rangeCheck(AbstractList.java:746)
          at java.util.SubList.get(AbstractList.java:619)
          at org.apache.fontbox.cff.CharStringConverter.drawAlternatingCurve(CharStringConverter.java:397)
          at org.apache.fontbox.cff.CharStringConverter.handleType1Command(CharStringConverter.java:142)
          at org.apache.fontbox.cff.CharStringConverter.handleCommand(CharStringConverter.java:77)
          at org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:53)
          at org.apache.fontbox.cff.CharStringConverter.handleType2Command(CharStringConverter.java:307)
          at org.apache.fontbox.cff.CharStringConverter.handleCommand(CharStringConverter.java:81)
          at org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:53)
          at org.apache.fontbox.cff.CharStringConverter.convert(CharStringConverter.java:64)
          at org.apache.fontbox.cff.CFFFont$Mapping.toType1Sequence(CFFFont.java:374)
          at org.apache.fontbox.cff.AFMFormatter.renderFont(AFMFormatter.java:126)
          at org.apache.fontbox.cff.AFMFormatter.printFontMetrics(AFMFormatter.java:64)
          at org.apache.fontbox.cff.AFMFormatter.printFont(AFMFormatter.java:57)
          at org.apache.fontbox.cff.AFMFormatter.format(AFMFormatter.java:50)
          at org.apache.pdfbox.pdmodel.font.PDType1CFont.prepareFontMetric(PDType1CFont.java:502)
          at org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:381)
          at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:104)
          at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:162)
          at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
          at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
          at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
          at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
          at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
          at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
          at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
          at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)

          Show
          Lars Torunski added a comment - I'm getting IllegalArgumentException und IndexOutOfBoundsException during text extraction java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-2) at java.util.SubList.<init>(AbstractList.java:604) at java.util.RandomAccessSubList.<init>(AbstractList.java:758) at java.util.RandomAccessSubList.subList(AbstractList.java:762) at org.apache.fontbox.cff.CharStringConverter.handleType2Command(CharStringConverter.java:259) at org.apache.fontbox.cff.CharStringConverter.handleCommand(CharStringConverter.java:81) at org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:53) at org.apache.fontbox.cff.CharStringConverter.convert(CharStringConverter.java:64) at org.apache.fontbox.cff.CFFFont$Mapping.toType1Sequence(CFFFont.java:374) at org.apache.fontbox.cff.AFMFormatter.renderFont(AFMFormatter.java:126) at org.apache.fontbox.cff.AFMFormatter.printFontMetrics(AFMFormatter.java:64) at org.apache.fontbox.cff.AFMFormatter.printFont(AFMFormatter.java:57) at org.apache.fontbox.cff.AFMFormatter.format(AFMFormatter.java:50) at org.apache.pdfbox.pdmodel.font.PDType1CFont.prepareFontMetric(PDType1CFont.java:502) at org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:381) at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:104) at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:162) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322) java.lang.IndexOutOfBoundsException: Index: 2,Size: 2 at java.util.SubList.rangeCheck(AbstractList.java:746) at java.util.SubList.get(AbstractList.java:619) at org.apache.fontbox.cff.CharStringConverter.drawAlternatingCurve(CharStringConverter.java:397) at org.apache.fontbox.cff.CharStringConverter.handleType1Command(CharStringConverter.java:142) at org.apache.fontbox.cff.CharStringConverter.handleCommand(CharStringConverter.java:77) at org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:53) at org.apache.fontbox.cff.CharStringConverter.handleType2Command(CharStringConverter.java:307) at org.apache.fontbox.cff.CharStringConverter.handleCommand(CharStringConverter.java:81) at org.apache.fontbox.cff.CharStringHandler.handleSequence(CharStringHandler.java:53) at org.apache.fontbox.cff.CharStringConverter.convert(CharStringConverter.java:64) at org.apache.fontbox.cff.CFFFont$Mapping.toType1Sequence(CFFFont.java:374) at org.apache.fontbox.cff.AFMFormatter.renderFont(AFMFormatter.java:126) at org.apache.fontbox.cff.AFMFormatter.printFontMetrics(AFMFormatter.java:64) at org.apache.fontbox.cff.AFMFormatter.printFont(AFMFormatter.java:57) at org.apache.fontbox.cff.AFMFormatter.format(AFMFormatter.java:50) at org.apache.pdfbox.pdmodel.font.PDType1CFont.prepareFontMetric(PDType1CFont.java:502) at org.apache.pdfbox.pdmodel.font.PDType1CFont.load(PDType1CFont.java:381) at org.apache.pdfbox.pdmodel.font.PDType1CFont.<init>(PDType1CFont.java:104) at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:162) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75) at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
          Hide
          Lars Torunski added a comment -

          I sent links to the PDFs which causes these two differents exceptions to Andreas by email.

          Show
          Lars Torunski added a comment - I sent links to the PDFs which causes these two differents exceptions to Andreas by email.
          Hide
          Andreas Lehmkühler added a comment -

          After improving the Type1C/Opentype support the IndexOutOfBound is gone (see revision 1425954 or PDFBOX-1473). The attached pdf works as well as the docs Lars sent me some time ago

          Show
          Andreas Lehmkühler added a comment - After improving the Type1C/Opentype support the IndexOutOfBound is gone (see revision 1425954 or PDFBOX-1473 ). The attached pdf works as well as the docs Lars sent me some time ago
          Hide
          HOREHLED Markus (sage) added a comment -

          Thank you for your mail!

          I am out of office until Wednesday, 2nd of January 2013 and am currently unable to read and/or respond to your mail.

          In my absence, please feel free to contact Mr. Gerfried Aigner (g.aigner@sage.at).

          Thank you for your understanding.

          Best regards
          Markus Horehled
          --------------------------------------------------
          Vielen Dank für Ihre Nachricht!

          Leider kann ich Ihre Nachricht zurzeit nicht bearbeiten. Ich werde sie erst nach meiner Rückkehr lesen und gegebenenfalls beantworten können. Ich bin ab Mittwoch, 2. Jänner 2013 wieder im Haus.

          In dringenden Fällen wenden Sie sich bitte während meiner Abwesenheit an Herrn Gerfried Aigner (g.aigner@sage.at).

          Vielen Dank!

          Mit freundlichen Grüßen
          Dipl.-Ing. Markus Horehled
          Software Architekt

          Sage GmbH
          Geschäftsbereich HR-Software
          1020 Wien, Stella-Klein-Löw-Weg 15
          Telefon: +43 1 277 04-788
          Telefax: +43 1 277 04-500
          E-Mail: m.horehled@sage.at<m.horehled@sage.at>
          Internet: www.sage.at/HR<http://www.sage.at/HR>

          Firmenbuchnummer 73840p
          Handelsgericht Wien
          Firmensitz Wien

          Please note: The information transmitted in this message and/or as an attachment to it is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material.
          Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited.
          If you received this in error, please contact the sender and delete the material from any computer. Sage GmbH do not accept legal responsibility for the contents of this message. Any views or opinions presented are solely those of the author and do not necessarily represent those of Sage GmbH unless otherwise specifically stated. Thank you!

          Sage GmbH - Sitz: Wien - FBNR: 73840p - Geschaeftsfuehrer: Benigna Prochaska, Christian Buell

          ----------------Disclaimer----------------
          Die in dieser E-Mail und den dazu gehoerigen Anhaengen (die Nachricht) enthaltenen
          Informationen sind nur fuer den Adressaten bestimmt und koennen vertrauliche und/oder
          rechtlich geschuetzte Informationen enthalten. Sollten Sie die Nachricht irrtuemlich
          erhalten haben, loeschen Sie die Nachricht bitte und benachrichtigen Sie den Absender,
          ohne die Nachricht zu kopieren oder zu verteilen oder ihren Inhalt an andere Personen
          weiterzugeben. Ausser bei Vorsatz oder grober Fahrlaessigkeit schliessen wir jegliche
          Haftung fuer Verluste oder Schaeden aus, die durch virenbefallene Software oder E-Mails
          verursacht werden.

          ----------------Disclaimer----------------
          The information contained in this e-mail and any attachments (the message) is intended for
          the addressee only and may contain confidential and/or privileged information. If you have
          received the message by mistake please delete it and notify the sender and do not copy or
          distribute it or disclose its contents to anyone. Except in case of gross negligence or
          wilful misconduct we accept no liability for any loss or damage caused by software or
          e-mail viruses.

          Show
          HOREHLED Markus (sage) added a comment - Thank you for your mail! I am out of office until Wednesday, 2nd of January 2013 and am currently unable to read and/or respond to your mail. In my absence, please feel free to contact Mr. Gerfried Aigner ( g.aigner@sage.at ). Thank you for your understanding. Best regards Markus Horehled -------------------------------------------------- Vielen Dank für Ihre Nachricht! Leider kann ich Ihre Nachricht zurzeit nicht bearbeiten. Ich werde sie erst nach meiner Rückkehr lesen und gegebenenfalls beantworten können. Ich bin ab Mittwoch, 2. Jänner 2013 wieder im Haus. In dringenden Fällen wenden Sie sich bitte während meiner Abwesenheit an Herrn Gerfried Aigner ( g.aigner@sage.at ). Vielen Dank! Mit freundlichen Grüßen Dipl.-Ing. Markus Horehled Software Architekt Sage GmbH Geschäftsbereich HR-Software 1020 Wien, Stella-Klein-Löw-Weg 15 Telefon: +43 1 277 04-788 Telefax: +43 1 277 04-500 E-Mail: m.horehled@sage.at< m.horehled@sage.at > Internet: www.sage.at/HR< http://www.sage.at/HR > Firmenbuchnummer 73840p Handelsgericht Wien Firmensitz Wien Please note: The information transmitted in this message and/or as an attachment to it is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. Sage GmbH do not accept legal responsibility for the contents of this message. Any views or opinions presented are solely those of the author and do not necessarily represent those of Sage GmbH unless otherwise specifically stated. Thank you! Sage GmbH - Sitz: Wien - FBNR: 73840p - Geschaeftsfuehrer: Benigna Prochaska, Christian Buell ---------------- Disclaimer ---------------- Die in dieser E-Mail und den dazu gehoerigen Anhaengen (die Nachricht) enthaltenen Informationen sind nur fuer den Adressaten bestimmt und koennen vertrauliche und/oder rechtlich geschuetzte Informationen enthalten. Sollten Sie die Nachricht irrtuemlich erhalten haben, loeschen Sie die Nachricht bitte und benachrichtigen Sie den Absender, ohne die Nachricht zu kopieren oder zu verteilen oder ihren Inhalt an andere Personen weiterzugeben. Ausser bei Vorsatz oder grober Fahrlaessigkeit schliessen wir jegliche Haftung fuer Verluste oder Schaeden aus, die durch virenbefallene Software oder E-Mails verursacht werden. ---------------- Disclaimer ---------------- The information contained in this e-mail and any attachments (the message) is intended for the addressee only and may contain confidential and/or privileged information. If you have received the message by mistake please delete it and notify the sender and do not copy or distribute it or disclose its contents to anyone. Except in case of gross negligence or wilful misconduct we accept no liability for any loss or damage caused by software or e-mail viruses.

            People

            • Assignee:
              Andreas Lehmkühler
              Reporter:
              Nilesh Naik
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development