Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.23, 3.0.0 PDFBox
-
None
-
Java 8, Windows 10 and Ubuntu 22
-
Patch
Description
We have an online service where our customers post their PDF files so that we can render them.
One of our customer noticed recently that one of its signed document did not show the image associated with the signature. They gave me the right to share this document and you will find it attached (PDFBOX-issue-rendering-signature.pdf).
The problem is in the last page, page 9. The issue can easily be reproduced using pdfbox-app-2.0*.jar PDFToImage.
Result with pdfbox 2.0.22 is:
Result with pdfbox 2.0.23 or later is:
The regression was introduced with commit (seen in git) f34a33824c4363b9b683245cb582328dc92b79ca, dated 2021-03-02 07:12:11+0000. The associated ticket was PDFBOX-5112.
The issue is in PDFXrefStreamParser's ObjectNumbers constructor, as it assumes that the COSInteger objects in the COSArray are necessarily sorted. In the case of the attached pdf, they are not, and this causes the parser to abort browsing the array too soon.
I have a patch for that on branch 2.0: Fixing_the_problem_when_the_COSArray_is_not_sorted_in_increasing_order_.patch
With this patch the image is created successfully. However, there are warning that appear, that did not exist in version 2.0.22:
Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey WARNING: found wrong object number. expected [6789] found [6791] Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey WARNING: found wrong object number. expected [6790] found [5327] Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey WARNING: found wrong object number. expected [6791] found [6485] Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey WARNING: found wrong object number. expected [6485] found [6789]
There may be additional fixes to be made in order to fully support this PDF. I did not have time to investigate, and also my knowledge of the codebase if fairly limited. So help would be appreciated here.
Thanks.
Attachments
Attachments
Issue Links
- relates to
-
PDFBOX-5112 Add more checks to PDFXrefStreamParser and reduce memory footprint
- Closed