[PDFBOX-1606] NonSequentialPDFParser produces garbage text in document info - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.8.1
Fix Version/s: 1.8.3, 2.0.0
Component/s: Parsing
Labels:
None
Environment:
Windows 7, JRE 1.7.0_15-b03

Description

For some documents, NonSequentialPDFParser produces PDDocumentInformation with binary garbage in its fields (title/author/producer/etc). Invocation of PDDocumentInformation.getXXXDate() methods fails with "IOException:Error converting date" for those documents.

Classic PDFParser does not have problems with the same documents.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PDFBOX-1606.patch
06/Sep/13 15:10
2 kB
Sebastian Nagel
00-214 EU Data Protection Directive Update 12-1.pdf
22/May/13 07:27
30 kB
Alex Alishevskikh

Issue Links

is related to

PDFBOX-1930 TimesNewRoman font should be substituted

Closed

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: Alex Alishevskikh

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/May/13 07:26

Updated:: 20/Feb/14 20:39

Resolved:: 09/Sep/13 17:18