Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.5.0, 1.6.0
Description
Get the following exception when getting text of some PDF if dup line does not contains font index (I can send a sample PDF file)
java.lang.NumberFormatException: For input string: "8#40"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.pdfbox.pdmodel.font.PDType1Font.getEncodingFromFont(PDType1Font.java:341)
at org.apache.pdfbox.pdmodel.font.PDType1Font.determineEncoding(PDType1Font.java:276)
at org.apache.pdfbox.pdmodel.font.PDFont.<init>(PDFont.java:181)
at org.apache.pdfbox.pdmodel.font.PDSimpleFont.<init>(PDSimpleFont.java:83)
at org.apache.pdfbox.pdmodel.font.PDType1Font.<init>(PDType1Font.java:152)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:75)
at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:115)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:243)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:242)
at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:255)
Suggested correction is :
in org.apache.pdfbox.pdmodel.font.PDType1Font.java in method getEncodingFromFont add try/catch block line 341 to avoid java.lang.NumberFormatException if dup line does not contains font index.
Attachments
Issue Links
- duplicates
-
PDFBOX-1481 Ignore postscript code when parsing a type1 font
- Closed
- is related to
-
PDFBOX-2227 java.io.IOException: Found Token[kind=NAME, text= ] but expected LITERAL for type1 font
- Closed