Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.8.7
Description
On a small number of test files in a 50k sample of pdfs from govdocs1, it appears that some characters are no longer being extracted correctly in 1.8.7 when compared to 1.8.6. I ran pdfbox's app.jar with ExtractText
764929.pdf 1.8.6: Lang, Astrophysical Data: Planets and Stars 1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde,
and
312888.pdf 1.8.6: Self-Assessment \u0026 Capability Description 1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih
Attachments
Attachments
Issue Links
- breaks
-
PDFBOX-2449 Character missing in text extraction
- Closed
- relates to
-
PDFBOX-2247 Regression in text extraction between 1.8.5 and 1.8.6
- Closed
-
TIKA-1442 Upgrade to PDFBox 1.8.8
- Closed
-
TIKA-1419 Upgrade to PDFBox 1.8.7
- Closed