Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.0.26
-
None
Description
TL;DR:
When using either of the methods "getThreads" or "setThreads" in class PDDocumentCatalog and saving the resulting document: Adobe Preflight is reporting an issue with the resulting "Threads" array in the document catalog and claims it should have been an indirect object reference instead of a direct object.
My claim: The COSWriter should be able to create indirect objects for COSArrays when required.
Checking PDF-32000-1:
In table 28 "Entries in the catalog dictionary" we can find the following definition:
Determining reasons:
1. The mentioned get and set methods create a COSArray for the entry "Threads" of the catalog dictionary
2. The COSWriter is assuming, that COSArrays should always preferably be written as a direct substructure of a dictionary.
This may be entirely true for other arrays, but in this case is is cause for a syntactical error in resulting documents. (It is plausible and possible - but has not been checked - whether this causes issues for other structures aswell.)
The COSWriter provides the means to create indirect objects for COSDictionaries, it however does (as far as I can see) not provide the means to flag a COSArray for the same handling.
Possible solutions:
As far as I can see the COSWriter would be entirely capable of creating COSObjects for any of the COSBase types, the only thing missing is the ability to mark a COSArray to be written indirectly and the matching handling by the COSWriter.
Adding something like:
at the right places in the COSWriter (similar to the handling of indirect COSDictionaries) seems to do the trick and resolves the issue.
Important issue?:
I fixed this on our end and hence it is not a pressing issue, also "Threads" is not as important and common as other structures and hence most documents and users won´t encounter this issue at all.
However - It would be nice, should this be fixed.
Concerning a possible patch:
I could provide a patch making the required changes, but would have to adapt it for the current PDFBox 2.0.27-SNAPSHOT as I developed it rather as a hotfix for our mirror of the library.
And concerning that patch I should mention:
As can be assumed - a "isDirectArray" and "setDirectArray" method have been added to the COSArray - which is a quick and dirty solution, as it would be preferable for COSArray to use the already existing "direct" field, that other COSBase types (COSDictionaries) already use.
As stated - the solution is quick and dirty and for a final solution in the PDFBox library a cleaner approach would be preferable. Hence I did not provide that patch for now.