Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-28

Spliiting a PDF creates unnecessarily large chunks

    Details

    • Type: Bug
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.0
    • Component/s: None
    • Labels:
      None

      Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1052458
      Originally submitted by bryang1 on 2004-10-22 13:23.

      Using PDFBox 0.6.7a, some PDFs contain objects that are
      inherited when the PDF is split into a smaller document
      using the Splitter class (even if the children
      documents are compressed).

      The linked PDF splits into chunks approximately the
      same size as the original. The first several pages
      will be smaller because I recreated them for debugging.
      The rest of the document will reflect the problem
      however. Try splitting after page 5, or at every page
      to recreate.

      PDF (13MB):
      http://esis.infofoundry.com:8080/audi/pdf/audi.ns.ssp.951903.pdf

      Opening and using the 'Save As' feature in Acrobat
      removes the unnecessary objects, but I can find no way
      to do this programmatically using PDFBox.

      Here are the messages from Acrobat when using 'Save As':

      "Consolidating duplicate images"
      "Consolidating duplicate page backgrounds"
      "Removeing unused objects and saving"

      Here is some sample code:

      // splitting:
      splitter.setSplitAtPage( split );
      documents = splitter.split( document );
      for( int i=0; i<documents.size(); i++ )
      {
      PDDocument doc = (PDDocument)documents.get( i );
      String fileName = pdfFile.substring(0,
      pdfFile.length()4 ) + "" + i + ".pdf";
      writeCompressedDocument( doc, fileName );
      }

      // saving w/ compression:
      fileOut = new FileOutputStream( fileName );
      COSStream stream = new COSStream(
      doc.getDocument().getScratchFile() );
      OutputStream output = stream.createUnfilteredStream();
      int length = new
      Long(doc.getDocument().getScratchFile().length()).intValue();
      byte[] bytes = new byte[length];
      doc.getDocument().getScratchFile().readFully(bytes, 0,
      length);
      output.write(bytes);
      stream.setFilters( COSName.FLATE_DECODE );

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lehmi Andreas Lehmkühler
                Reporter:
                Anonymous
              • Votes:
                1 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: