Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-5485

Stackoverflow writing out a subset of PDF pages - COSWriterObjectStream

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.0.0 PDFBox
    • 3.0.0 PDFBox
    • Writing
    • None
    • MacOS, but likely not OS specific.

    Description

      Version:  org.apache.pdfbox:pdfbox:3.0.0-alpha3

       

      In a subset of PDFs I process, I cannot extract a range of PDF pages and write them out to a new PDF.   ( As part of test code )

      Here's the Kotlin code I use 

      fun extractPages(documentName: String, fromPage: Int, toPage: Int) : Path {
         val pdfFile = Paths.get("data", "input", "PDFS", "${documentName}.pdf")
         val pdfPagesFile = Paths.get("data", "input", "PDFS", "${documentName}_Page_$fromPage-$toPage.pdf")        
         val pdfDoc = org.apache.pdfbox.Loader.loadPDF(pdfFile.toFile())
         val pageExtractor = PageExtractor(pdfDoc, fromPage, toPage)        
         val pdfPages = pageExtractor.extract()
         pdfPages.save(pdfPagesFile.toFile())
         return pdfPagesFile
      }

      It doesn't occur in all PDFS... maybe 10-20% of the PDFs I use. 

       

      The a slice of the stack trace is 

      java.lang.StackOverflowError
          at java.base/java.util.HashMap.tableSizeFor(HashMap.java:380)
          at java.base/java.util.HashMap.<init>(HashMap.java:453)
          at java.base/java.util.LinkedHashMap.<init>(LinkedHashMap.java:347)
          at java.base/java.util.HashSet.<init>(HashSet.java:162)
          at java.base/java.util.LinkedHashSet.<init>(LinkedHashSet.java:154)
          at org.apache.pdfbox.util.SmallMap.entrySet(SmallMap.java:380)
          at org.apache.pdfbox.cos.COSDictionary.entrySet(COSDictionary.java:1225)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:336)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSArray(COSWriterObjectStream.java:319)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:226)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeObject(COSWriterObjectStream.java:230)
          at org.apache.pdfbox.pdfwriter.compress.COSWriterObjectStream.writeCOSDictionary(COSWriterObjectStream.java:341) 

      As I mentioned, hits some PDFs, not all.

      I legally cannot share the original source PDFs but it looks like a recursive loop in writeCOSDictionary and writeObject in COSWriterObjectStream.

      Attachments

        Activity

          People

            lehmi Andreas Lehmkühler
            omcgovern Owen McGovern
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: