Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-4066

Merging documents with nested fields duplicates child fields

    Details

      Description

      I have a pdf with a lot of acroforms, I do some manipulation on it which results in a new pdf. So I have PDF-1 (which is the original one )and PDF-2 (just a duplication of PDF-1), now I want to merge them. Both PDFs have some acroforms for example: field_a, field_2...

      Before I merge them I flatten PDF-1, because I only want to have acrofields from PDF-2. When I check then my new merged PDF I can see that there are no visible fields on on the pages from PDF-1 and there are fields on pages of fields of PDF-2. At the first look it seems ok, but when I inspect the fields I can see that the merger has renamed all the fields for PDF-2 e.g. field_a_dummy123, field_b_dummy232 ...

      It seems to me, that flattening does not remove the fields and thats why the PDFMerger from PDFBox will rename the fields for PDF-2 because acrofields need to be unique.Another guess was that there is a bug in mergeAcroForm()

       

      @Test
      public void flattenAndMerge() throws IOException {
          File testForm = new File(classLoader.getResource("./TestForm.pdf").getFile());
      
          byte[] testFormAsByte = Files.readAllBytes(testForm.toPath());
          byte[] testFormAsByte2 = Files.readAllBytes(testForm.toPath());
      
          PDDocument pdf1 = PDDocument.load(testFormAsByte);
          PDAcroForm acroform = pdf1.getDocumentCatalog().getAcroForm();
          acroform.flatten();
          Path flattendedPdf = Files.createTempFile("flatten", ".pdf");
          pdf1.save(flattendedPdf.toFile());
      
      
          PDFMergerUtility merger = new PDFMergerUtility();
          merger.addSource(new ByteArrayInputStream(Files.readAllBytes(flattendedPdf)));
          merger.addSource(new ByteArrayInputStream(testFormAsByte2));
          merger.setDestinationFileName("./build/flattenAndMerge.pdf");
          merger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
      
      }
      

      Here is my SO Article

      https://stackoverflow.com/questions/48271924/pdfbox-flatten-pdf-does-not-remove-acroform-elements?noredirect=1#comment83544858_48271924

       

       

        Attachments

        1. TestForm-merged.pdf
          101 kB
          Maruan Sahyoun
        2. TestForm-flattened.pdf
          35 kB
          Maruan Sahyoun
        3. TestForm.pdf
          37 kB
          Al Phaba
        4. flattenAndMerge.pdf
          68 kB
          Al Phaba

          Activity

            People

            • Assignee:
              msahyoun Maruan Sahyoun
              Reporter:
              AlPhaba Al Phaba
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: