[PDFBOX-3677] NullPointerException in Type1Parser.read - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.3, 2.0.4
Fix Version/s: 2.0.5, 3.0.0 PDFBox
Component/s: FontBox
Labels:
- type1
- type1font
Environment:
Windows 10, java version "1.8.0_25"

Description

Text extraction from certain PDFs is not possible and PDF Box responses with NullPointerException. Text extraction from same PDF with version 1.8.13 is working.

Originally the issue was discovered while using the newest Apache Tika 1.14 library. I can not down-grade to PDF Box 1.8.13 with Apache Tika 1.14.

Unfortunately I can not provide the PDFs that fail to you. However, I did some testing and found out that “Token token = lexer.nextToken();” return Null.

Feb 07, 2017 12:17:40 PM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
SEVERE: Can't read the embedded Type1 font AAAAAB+Arial-BoldMT
java.io.IOException: Found token=null but expected NAME

Caused by: java.io.EOFException
at org.apache.pdfbox.io.ScratchFileBuffer.seek(ScratchFileBuffer.java:302)
at org.apache.pdfbox.pdfparser.COSParser.checkXRefOffset(COSParser.java:1177)
at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:202)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

StackTrace.txt
07/Feb/17 11:20
2 kB
ManuelG
F1.txt
08/Feb/17 09:09
1 kB
ManuelG
F2.txt
08/Feb/17 09:09
2 kB
ManuelG
F1.PFB
08/Feb/17 16:33
14 kB
ManuelG
F2.PFB
08/Feb/17 16:33
18 kB
ManuelG
Resources_ScreenShot.GIF
08/Feb/17 16:33
33 kB
ManuelG

Issue Links

relates to

PDFBOX-3112 Avoid crazy /Length1 values in font descriptor

Closed

PDFBOX-2350 Type1 Parser hangs indefinitely

Closed

Activity

People

Assignee:: Tilman Hausherr

Reporter:: ManuelG

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 07/Feb/17 11:20

Updated:: 25/Mar/17 18:13

Resolved:: 09/Feb/17 17:20