Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.19
Description
I'm running into an issue when flattening form fields, using PDFBox version v2.0.19. When calling PDAcroForm.flatten(), all annotations on pages without form fields get removed.
I created a sample document to illustrate this issue, this document contains 2 pages:
- page 1: a text field and a link annotation
- page 2: only a link annotation
When you flatten this document, the link annotation on the 2nd page gets removed, while it shouldn't be.
PDF Documents and Java files to reproduce this are attached:
- CreateDocument.java creates flatten.pdf
- FlattenDocument.java flattens flatten.pdf and creates flattened.pdf
After debugging, I think I found the cause. In the PDAcroForm class, flatten(...) calls the buildPagesWidgetsMap(...) method, which iterates over the form fields and builds a map of pages and their widget annotations. Because the 2nd page doesn't contain form fields, this page is not added to the map. Then flatten() iterates over the pages and gets the widgets for that page from the created pagesWidgetsMap map. However, because the 2nd page didn't have annotations and therefore wasn't added to the map, this results in widgetsForPageMap being null.
Next, for every annotation on this page, the following check is performed:
if (widgetsForPageMap != null && !widgetsForPageMap.contains(annotation.getCOSObject())) { annotations.add(annotation); }
Because widgetsForPageMap is null, the annotation is not added to the annotations list and therefore not retained. The first page did contain a field and is thus added to the pagesWidgetsMap, resulting in widgetsForPageMap not being null, the annotation being added the annotations list and thus the annotation is retained.
I thinks this is a regression from https://svn.apache.org/r1828871 and could be solved by using:
if (widgetsForPageMap == null || !widgetsForPageMap.contains(annotation.getCOSObject())) { annotations.add(annotation); }
Please let me know if you have any questions!