[PDFBOX-28] Spliiting a PDF creates unnecessarily large chunks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: None
Labels:
None

Description

[imported from SourceForge]
http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1052458
Originally submitted by bryang1 on 2004-10-22 13:23.

Using PDFBox 0.6.7a, some PDFs contain objects that are
inherited when the PDF is split into a smaller document
using the Splitter class (even if the children
documents are compressed).

The linked PDF splits into chunks approximately the
same size as the original. The first several pages
will be smaller because I recreated them for debugging.
The rest of the document will reflect the problem
however. Try splitting after page 5, or at every page
to recreate.

PDF (13MB):
http://esis.infofoundry.com:8080/audi/pdf/audi.ns.ssp.951903.pdf

Opening and using the 'Save As' feature in Acrobat
removes the unnecessary objects, but I can find no way
to do this programmatically using PDFBox.

Here are the messages from Acrobat when using 'Save As':

"Consolidating duplicate images"
"Consolidating duplicate page backgrounds"
"Removeing unused objects and saving"

Here is some sample code:

// splitting:
splitter.setSplitAtPage( split );
documents = splitter.split( document );
for( int i=0; i<documents.size(); i++ )
{
PDDocument doc = (PDDocument)documents.get( i );
String fileName = pdfFile.substring(0,
pdfFile.length()~~4 ) + "~~" + i + ".pdf";
writeCompressedDocument( doc, fileName );
}

// saving w/ compression:
fileOut = new FileOutputStream( fileName );
COSStream stream = new COSStream(
doc.getDocument().getScratchFile() );
OutputStream output = stream.createUnfilteredStream();
int length = new
Long(doc.getDocument().getScratchFile().length()).intValue();
byte[] bytes = new byte[length];
doc.getDocument().getScratchFile().readFully(bytes, 0,
length);
output.write(bytes);
stream.setFilters( COSName.FLATE_DECODE );

Attachments

Issue Links

is related to

PDFBOX-2742 PDFSplit ignores global resources

Closed

relates to

PDFBOX-785 Spliting a PDF creates unnecessarily large files

Closed

Activity

People

Assignee:: Andreas Lehmkühler

Reporter:: Anonymous

Votes:: 1 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 22/Oct/04 20:23

Updated:: 02/Apr/15 10:51

Resolved:: 24/Nov/10 20:22