Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-2313

ExtractImages finds never-rendered images

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0
    • Fix Version/s: 2.0.0
    • Component/s: Utilities
    • Labels:
      None

      Description

      The file from PDFBOX-2101 is still causing unexpectedly high memory use with ExtractImages when compared to PDFToImage. Given that PDFToImage uses the same caching strategy, it's not really a caching issue, though we might still want to think about that.

      The PDF contains 55 images on the first page which are never rendered and ExtractImages runs out of memory trying to extract them all. Given that PDFs often contain junk like this, I suggest that ExtractImages only extract images which are actually drawn to the page at some point.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              jahewson John Hewson
              Reporter:
              jahewson John Hewson

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment