Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4007

Merged documents don't retain tags

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • 2.0.8
    • None
    • Utilities
    • Patch

    Description

      Certain combinations of documents don't retain tags when merged. The document Tagged.pdf is just a basic one word PDF created and tagged with Pro DC. If you try to merge this with the government General Forbearance form the output crashes DC when you try to view the tags. If you use a flattened version of the General Forbearance form then the tags are just munged.

          public static void main(String[] args) throws Exception {
              PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
              PDDocument src = PDDocument.load(new File("Tagged.pdf"));
              PDDocument dest = PDDocument.load(new File("GeneralForbearance.pdf"));
              pdfMergerUtility.appendDocument(dest, src);
              src.close();
              dest.save(new File("BrokenTags.pdf"));
              dest.close();
          }
      

      The included patch appears to make tagging more reliable, but I'm still relying heavily on cloning which can apparently cause other issues. The documents I get out with this code seem present correctly in Adobe readers for all combinations of documents that I tested against.

      My patch is made and tested against yesterdays production head and it includes my changes from PDFBOX-3999 since it is in the exact same place in the code.

      The priority of this is a blocker for 508 compliance of merged documents but I guessed it to be more of a minor issue in the overall scheme of things, please correct me if I am mistaken.

      Thanks!

      Attachments

        1. Tagged.pdf
          10 kB
          Dave Hill
        2. PDFMergeUtility.patch
          3 kB
          Dave Hill
        3. Tagged+GeneralForbearance-Merged.pdf
          689 kB
          Tilman Hausherr
        4. HelloWorldTagged.pdf
          2 kB
          Dave Hill
        5. PDFMergeUtility-2.patch
          6 kB
          Dave Hill
        6. FourFontsTagged.pdf
          3 kB
          Dave Hill
        7. Tagged-GeneralForbearance-merged-21.12.2018.pdf
          689 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              Unassigned Unassigned
              DavesPlanet Dave Hill
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: