Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5809

PDDocument#importPage slowed down by factor 1300

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.31, 3.0.2 PDFBox
    • 2.0.32, 3.0.3 PDFBox, 4.0.0
    • Utilities
    • None

    Description

      We are using the PDDocument#importPage Method in our own splitter where we split pages from a SourceDocument to a TargetDocument. In order to do so we first extract the page by using the following code:

      final PDPage sourcePage = sourceDocument.getPage(pageNumber);
      

      Immediatly afterwards we are calling:

      final PDPage targetPage = targetDocument.importPage(sourcePage);
      

      This approach worked just fine with pdfbox 2.0.26.
      We decided to upgrade to version 3.0.2 since it takles a lot of the problems.

      Unfortunately the PDDocument#importPage method slowed down by around 1300 times. In Version 2.0.26 it took 15ms in an average. With the latest 3.0.2 it takes 20000 ms in average. That is a huge deal breaker as we usually have to split documents which have several thousand pages.

      Note: The same applies when using PDDocument#addPage.
      Note: The problem does not appear in 3.0.1. But we can't use that since it has other major problems which breaks our application.

      I have prepared an example document with which you can replicate the issue. Due to the file size limitation I had to prepare a WeTransfer-Link for you: https://we.tl/t-lfN2wz7cAs

      Attachments

        1. image-2024-04-27-18-50-19-199.png
          126 kB
          Tilman Hausherr

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              mko91 Marcus Korinth
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: