Uploaded image for project: 'PDFBox'
  1. PDFBox
  2. PDFBOX-28

Spliiting a PDF creates unnecessarily large chunks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Resolution: Fixed
    • None
    • 1.4.0
    • None
    • None

    Description

      [imported from SourceForge]
      http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1052458
      Originally submitted by bryang1 on 2004-10-22 13:23.

      Using PDFBox 0.6.7a, some PDFs contain objects that are
      inherited when the PDF is split into a smaller document
      using the Splitter class (even if the children
      documents are compressed).

      The linked PDF splits into chunks approximately the
      same size as the original. The first several pages
      will be smaller because I recreated them for debugging.
      The rest of the document will reflect the problem
      however. Try splitting after page 5, or at every page
      to recreate.

      PDF (13MB):
      http://esis.infofoundry.com:8080/audi/pdf/audi.ns.ssp.951903.pdf

      Opening and using the 'Save As' feature in Acrobat
      removes the unnecessary objects, but I can find no way
      to do this programmatically using PDFBox.

      Here are the messages from Acrobat when using 'Save As':

      "Consolidating duplicate images"
      "Consolidating duplicate page backgrounds"
      "Removeing unused objects and saving"

      Here is some sample code:

      // splitting:
      splitter.setSplitAtPage( split );
      documents = splitter.split( document );
      for( int i=0; i<documents.size(); i++ )
      {
      PDDocument doc = (PDDocument)documents.get( i );
      String fileName = pdfFile.substring(0,
      pdfFile.length()4 ) + "" + i + ".pdf";
      writeCompressedDocument( doc, fileName );
      }

      // saving w/ compression:
      fileOut = new FileOutputStream( fileName );
      COSStream stream = new COSStream(
      doc.getDocument().getScratchFile() );
      OutputStream output = stream.createUnfilteredStream();
      int length = new
      Long(doc.getDocument().getScratchFile().length()).intValue();
      byte[] bytes = new byte[length];
      doc.getDocument().getScratchFile().readFully(bytes, 0,
      length);
      output.write(bytes);
      stream.setFilters( COSName.FLATE_DECODE );

      Attachments

        Issue Links

          Activity

            People

              lehmi Andreas Lehmkühler
              Anonymous Anonymous
              Votes:
              1 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: