Details
-
Bug
-
Status: Closed
-
Resolution: Fixed
-
None
-
None
-
None
Description
[imported from SourceForge]
http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1052458
Originally submitted by bryang1 on 2004-10-22 13:23.
Using PDFBox 0.6.7a, some PDFs contain objects that are
inherited when the PDF is split into a smaller document
using the Splitter class (even if the children
documents are compressed).
The linked PDF splits into chunks approximately the
same size as the original. The first several pages
will be smaller because I recreated them for debugging.
The rest of the document will reflect the problem
however. Try splitting after page 5, or at every page
to recreate.
PDF (13MB):
http://esis.infofoundry.com:8080/audi/pdf/audi.ns.ssp.951903.pdf
Opening and using the 'Save As' feature in Acrobat
removes the unnecessary objects, but I can find no way
to do this programmatically using PDFBox.
Here are the messages from Acrobat when using 'Save As':
"Consolidating duplicate images"
"Consolidating duplicate page backgrounds"
"Removeing unused objects and saving"
Here is some sample code:
// splitting:
splitter.setSplitAtPage( split );
documents = splitter.split( document );
for( int i=0; i<documents.size(); i++ )
{
PDDocument doc = (PDDocument)documents.get( i );
String fileName = pdfFile.substring(0,
pdfFile.length()4 ) + "" + i + ".pdf";
writeCompressedDocument( doc, fileName );
}
// saving w/ compression:
fileOut = new FileOutputStream( fileName );
COSStream stream = new COSStream(
doc.getDocument().getScratchFile() );
OutputStream output = stream.createUnfilteredStream();
int length = new
Long(doc.getDocument().getScratchFile().length()).intValue();
byte[] bytes = new byte[length];
doc.getDocument().getScratchFile().readFully(bytes, 0,
length);
output.write(bytes);
stream.setFilters( COSName.FLATE_DECODE );
Attachments
Issue Links
- is related to
-
PDFBOX-2742 PDFSplit ignores global resources
- Closed
- relates to
-
PDFBOX-785 Spliting a PDF creates unnecessarily large files
- Closed