Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5518

"Threads" array in Document Catalog should be an indirect reference

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.26
    • 3.0.0 PDFBox
    • PDModel
    • None

    Description

      TL;DR:
      When using either of the methods "getThreads" or "setThreads" in class PDDocumentCatalog and saving the resulting document: Adobe Preflight is reporting an issue with the resulting "Threads" array in the document catalog and claims it should have been an indirect object reference instead of a direct object.

      My claim: The COSWriter should be able to create indirect objects for COSArrays when required.

      Checking PDF-32000-1:
      In table 28 "Entries in the catalog dictionary" we can find the following definition:

      Determining reasons:
      1. The mentioned get and set methods create a COSArray for the entry "Threads" of the catalog dictionary
      2. The COSWriter is assuming, that COSArrays should always preferably be written as a direct substructure of a dictionary.

      This may be entirely true for other arrays, but in this case is is cause for a syntactical error in resulting documents. (It is plausible and possible - but has not been checked - whether this causes issues for other structures aswell.)

      The COSWriter provides the means to create indirect objects for COSDictionaries, it however does (as far as I can see) not provide the means to flag a COSArray for the same handling.

      Possible solutions:
      As far as I can see the COSWriter would be entirely capable of creating COSObjects for any of the COSBase types, the only thing missing is the ability to mark a COSArray to be written indirectly and the matching handling by the COSWriter.
      Adding something like:

      at the right places in the COSWriter (similar to the handling of indirect COSDictionaries) seems to do the trick and resolves the issue.

      Important issue?:
      I fixed this on our end and hence it is not a pressing issue, also "Threads" is not as important and common as other structures and hence most documents and users won´t encounter this issue at all.

      However - It would be nice, should this be fixed.

      Concerning a possible patch:
      I could provide a patch making the required changes, but would have to adapt it for the current PDFBox 2.0.27-SNAPSHOT as I developed it rather as a hotfix for our mirror of the library.

      And concerning that patch I should mention:
      As can be assumed - a "isDirectArray" and "setDirectArray" method have been added to the COSArray - which is a quick and dirty solution, as it would be preferable for COSArray to use the already existing "direct" field, that other COSBase types (COSDictionaries) already use.

      As stated - the solution is quick and dirty and for a final solution in the PDFBox library a cleaner approach would be preferable. Hence I did not provide that patch for now.

      Attachments

        1. image-2022-09-23-09-50-30-766.png
          37 kB
          Christian Appl
        2. image-2022-09-23-10-03-15-070.png
          30 kB
          Christian Appl
        3. image-2022-09-23-10-54-31-618.png
          177 kB
          Christian Appl
        4. image-2023-07-17-10-57-12-609.png
          45 kB
          Christian Appl
        5. threads-out.pdf
          0.4 kB
          Christian Appl

        Activity

          People

            lehmi Andreas Lehmkühler
            capSVD Christian Appl
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: