Details
-
Task
-
Status: Closed
-
Trivial
-
Resolution: Fixed
-
2.0.28, 3.0.0 PDFBox
-
None
Description
I'm not sure this is a regression, and apologies if you already dealt with this before the release of 2.0.28. Also, as a warning, this file is corrupt.
We used to get more text out of this file in 2.0.27 than we do now in 2.0.28: https://corpora.tika.apache.org/base/docs/bug_trackers/evince/evince-395-0.zip-0.pdf
This file derived from the evince bug tracker, which now eventually links to this issue:
https://gitlab.freedesktop.org/poppler/poppler/-/issues/323
This image from the poppler issue shows what we get with PDFBox 2.0.28 on the left, and 2.0.27 on the right.
If the decision is "the file is corrupt -> not going to fix", I completely understand.
Attachments
Issue Links
- relates to
-
PDFBOX-5178 Parsing differences between 2.0.23 and 2.0.24/3.0
- Closed