Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-3852

Overlay a pdf file which is 750 pages ends up in OutOfMemoryError

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.6
    • 2.0.7, 3.0.0 PDFBox
    • PDModel
    • Unbuntu, jetty
    • Patch, Important

    Description

      We found an issue and solution to fix it, you guys might would be interested to have a look and see whether it is worth applying the attached patch to benefit more pdfbox users. And a bit more detail this error happens based on jetty running time memory setting, and pdf file size.

      • Application platform:
        Unbuntu, jetty
      • The test case to produce this issue:
        Add simple overlay to all pages (in this case it is 750 pages). The processPages function eats up the JVM memories while applying the overlay to the file.
      • sample code for using pdfbox overlay:
         PDDocument document = PDDocument.load( pdf );
         HashMap<Integer, String> overlayGuide = new HashMap();
         for (int i = 0; i < pagenunber; i++)
         {
          // "watermarked.pdf" meat to be a file which contains watermarks on the page
           overlayGuide.put(i+1, "watermarked.pdf");
         }
         Overlay overlay = new Overlay();
         overlay.setInputPDF( document );
         overlay.setOverlayPosition( Overlay.Position.FOREGROUND );
         PDDocument overlayResult = overlay.overlay( overlayGuide );
        
      • Error log:
        INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | java.lang.OutOfMemoryError: Java heap space
        STATUS | wrapper  | main    | 2017/07/03 13:06:23 | Filter trigger matched.  Restarting JVM.
        INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.io.ScratchFile.<init>(ScratchFile.java:128)
        INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.io.ScratchFile.getMainMemoryOnlyInstance(ScratchFile.java:143)
        INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.cos.COSStream.<init>(COSStream.java:55)
        INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.multipdf.Overlay.createStream(Overlay.java:***)
        INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.multipdf.Overlay.processPages(Overlay.java:364)
        INFO   | jvm 1    | main    | 2017/07/03 13:06:23 | 	at org.apache.pdfbox.multipdf.Overlay.overlay(Overlay.java:128)
        
      • Solution
        Apply MemoryUsageSetting to Overlay, allows Overlay to use file as temp output.
      • Update for the Overlay usage:
         PDDocument document = PDDocument.load( pdf );
         HashMap<Integer, String> overlayGuide = new HashMap();
         for (int i = 0; i < pagenunber; i++)
         {
           overlayGuide.put(i+1, "watermarked.pdf");
         }
         Overlay overlay = new Overlay();
         overlay.setInputPDF( document );
         overlay.setOverlayPosition( Overlay.Position.FOREGROUND );
         // set overlay to use temp file as out rather than memory
         MemoryUsageSetting memoryUsageSetting = MemoryUsageSetting.setupTempFileOnly(  );
         memoryUsageSetting.setTempDir( new File ( "someTempWorkingDir" ) );
         overlay.setMemoryUsageSetting( memoryUsageSetting );
         PDDocument overlayResult = overlay.overlay( overlayGuide );
        

      Attachments

        1. 750-pages.pdf
          502 kB
          ryuukei
        2. Overlay.patch
          5 kB
          ryuukei
        3. McAfee.pdf
          132 kB
          Tilman Hausherr
        4. watermarked.pdf
          196 kB
          ryuukei

        Issue Links

          Activity

            People

              tilman Tilman Hausherr
              ryuukei ryuukei
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: