[TIKA-389] Garbled metadata when dealing with encrypted PDF files. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.6
Fix Version/s: 0.9
Component/s: metadata, parser
Labels:
None
Environment:

Windows 7 64-bit

Description

The code exhibiting this issue is very simple:

InputStream input = new FileInputStream(file);
ContentHandler textHandler = new BodyContentHandler();
tikaParser.parse(input, textHandler, metadata);
input.close();
System.out.println(metadata);

The output:
title=?a?▬÷&▼♂?ŢjK???ž?↑M?A→<═]1
=╬\bK Author=═g?═?♦ Content-Type=application/pdf creator=?k?═?♦Ý`;Ý?)/¶?Ě?3n
Î☼46ËO

Other than that, the extracted text is 100% correct.

Attachments

Issue Links

is blocked by

PDFBOX-814 unreadable document information

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Gabriel Miklos

Votes:: 1 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 23/Mar/10 00:26

Updated:: 02/Aug/12 09:33

Resolved:: 09/Dec/10 02:07