[PDFBOX-3445] Can not read PDF correctly - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Not A Problem
Affects Version/s: 2.0.2
Fix Version/s: None
Component/s: FontBox, Text extraction
Labels:
None

Description

Hi Team,
I have two PDF in Gujarati language but font is Different, 1st PDF have Shruti font and 2nd PDF have LMG-RUPE font, Shruti read correctly in tika parser and it gives me a correct output, but LMG-RUPE pdf gives me a worng output. Metadata is same for both pdf.
1) https://drive.google.com/open?id=0B4Sse_x7pvrqRnRETzNsUk1BY0k (Shruti font)
2) https://drive.google.com/open?id=0B4Sse_x7pvrqVC0zb2NqTzNvYVU (LMG-RUPE font)

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

PDFBOX-3445-rupen-debugger.png
01/Aug/16 17:24
118 kB
Tilman Hausherr
PDFBOX-3445-rupen.pdf
01/Aug/16 17:24
48 kB
Tilman Hausherr

Issue Links

relates to

TIKA-2046 Can not read PDF correctly

Resolved

links to

Tika can not read text correctly from PDF file

Activity

People

Assignee:: Unassigned

Reporter:: gopalbhalala

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Aug/16 16:13

Updated:: 02/Aug/16 17:10

Resolved:: 02/Aug/16 16:49