[PDFBOX-897] NullPointerException PDFFont#getEncodingFromFont with a PDF book because Type1Encoding is null - ASF JIRA

XML

Word

Printable

JSON

A NullPointerException was thrown while extracting text from a PDF ebook. The exception was thrown in

PDFFont#getEncodingFromFont line:

[snip]
encoding.addCharacterEncoding(index, name.replace("/", ""));
[snip]

encoding was null. The line that was scanned was "/Encoding 256 array 0 1 255

{1 index exch /.notdef put}

for". The array check however only checks for line.endsWith("array"). The NPE was fixed when using line.contains("array") instead.

I have added a patch. The PDF is a PDF book with copyright so it cannot be attached as an example. The meta data of the document was:

Acrobat Distiller 7.0 (Windows)
PScript5.dll Version 5.2.2
PDF-1.6