[PDFBOX-5035] Missing character in text extraction - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Not A Bug
Affects Version/s: 2.0.21
Fix Version/s: None
Component/s: Text extraction
Labels:
None

Description

If applying the PDFTextStripper to the attached PDF, the highlghted text:

is read as "8,0000" instead of "48,0000", then it seems the character "4" get lost.

Is this a a bug or anything related to internal PDF structure?

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

FT_FTDGT-03770_20.pdf
07/Dec/20 08:50
83 kB
Marco Barbi
FT_FTDGT-03770_20.txt
07/Dec/20 09:14
0.8 kB
Tilman Hausherr
image-2020-12-07-09-47-40-046.png
07/Dec/20 08:47
12 kB
Marco Barbi

Activity

People

Assignee:: Unassigned

Reporter:: Marco Barbi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Dec/20 08:50

Updated:: 10/Dec/20 07:44

Resolved:: 10/Dec/20 07:44