Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5852

Hi CPU and memory usage when converting a PDF with type 4 shading

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.28
    • 2.0.33, 3.0.3 PDFBox, 4.0.0
    • Rendering
    • None

    Description

      We've observed excessive CPU and memory consumption when converting a PDF to images when the PDF contains type 4 shading.  This is especially noticeable when the conversion is done with a high DPI.  Can this be improved?

       

      Conversation from the PDFBox users mailing list follows

      Initial email:

      Hi CPU and memory usage when converting a PDF with type 4 shadingHello PDFBox users and maintainers,

      We have a PDF that causes performance problems when we use PDFBox to
      convert it to an image with renderImageWithDPI(). We're calling
      renderImageWithDPI()
      with 650 DPI. I realize this is a very high value - we're using it for
      high fidelity original images that will later be downsampled. On my work
      laptop which has fairly strong hardware, the conversion takes 25 minutes
      and consumes 20GB of memory. CPU and memory usage is reduced if we use a
      lower DPI.

      The PDF is 1 page long. It contains type 4 shading / Gouraud free form
      triangle meshes. We've been aware of some performance issues with type 4
      shading for a little while now, but the PDFs that contained the type 4
      shading belonged to our customers and we were not authorized to share
      them. We finally found a problem input document that is non-sensitive and
      that we are authorized to share. I've attached a copy of the problem PDF
      to this email.

      I searched the archives for the users and the developers mailing list and I
      didn't find anything specifically about this issue.
      I searched through the PDFBox jira tickets and I found a couple of tickets
      that looked similar: PDFBOX-2901 & PDFBOX-4491. PDFBOX-2901 seems to most
      closely describe what we're seeing, but that was closed in PDFBox 2.0.0,
      and our issue still reproduces with PDFBox 2.0.28.

      Should I refer this issue over to the developers mailing list or create a
      PDFBox Jira ticket for this?

      Thanks and Regards,
      Larry Lynn

      Response:

      Hi,

      Yes shading can be very slow, especially at high dpi. The attachment
      didn't get through, please upload to a sharehoster or create a ticket.
      If you need to register then add a meaningful text, e.g. the subject of
      this post so we know you're not a spammer. Also retry with 2.0.31 and
      3.0.2 just to be sure. However I'm pessimistic that this can be fixed.

      Tilman

       

      Attachments

        1. minimal.pdf
          17 kB
          Larry Lynn
        2. CIB-coonsmesh.pdf
          7 kB
          Tilman Hausherr

        Activity

          People

            lehmi Andreas Lehmkühler
            larry.lynn@workiva.com Larry Lynn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: