Bug 51265 - Enhance Handling of Picture Parts
Summary: Enhance Handling of Picture Parts
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XWPF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 enhancement (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-25 12:53 UTC by Stefan Stern
Modified: 2011-06-06 08:46 UTC (History)
0 users



Attachments
Patch to all POI-classes affected (191.49 KB, patch)
2011-05-25 13:05 UTC, Stefan Stern
Details | Diff
Patch to all POI-Testcases, affected by the changes. (51.78 KB, patch)
2011-05-25 13:06 UTC, Stefan Stern
Details | Diff
Add Image-Handling Unit-Tests (9.76 KB, patch)
2011-06-06 06:53 UTC, Stefan Stern
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Stern 2011-05-25 12:53:33 UTC
In OOXML, multiple references of the same picture refer to the same picture part inside the package. This applies, even if multiple parts refer to the same image, for example a header, footer and the main document. 

Up to now, there is no functionality in POI to support this behavior for any new image. Instead, adding the same picture n times, will create n new parts with identical content. This potentially increases the package size daramatically, given a scenario where a larger picture is referrenced several times.

In order to support the image part reuse, all available images must be managed in a central spot. All OPCPackages have one main part, which in Word is the XWPFDocument. This results in XWPFDocument being a subclass of POIXMLDocument, whereas all other XWPFClasses, which represent a package, subclass POIXMLDocumentPart only. 

The XWPFDocument herewith not only represents the main document, but also represents the package and all package-wide settings & data. Therefore, managing all package-known pictures should take place in the XWPFDocument. Besides the management of all package-pictureparts, the document also manages all those picture-parts, which are related to the main document part. This is the same task, that applies to headers and footers as well. They must take care for all picture-parts related by their corresponding package-part.

Enabling reuse of picture parts complicates the removal of package parts. As long as every part is referenced once, the removal of the reference goes along with removing the part. Removing an image part, in case a reference to this part was removed, may corrupt the document, as other references may still require that part. A solution to this issue is a simple reference counter, which is increased whenever a new relationship is created, and decreased whenever a relationship is removed. If the last relationship counter reaches zero, when a relation is removed, the target part can be deleted. 

While changing POIXMLDocumentPart, the relation-handling was reworked. Instead of keeping a simple list of all related parts, a Map was used, which uses the relation-ID as key and the target part as value. This eases the handling of all parts, but required some changes to the API (changes to XSSFDrawing have been applied). Additionally, the original list was kept in order to represent the relation-sequence, as these appear in the _rels XML document and several Unit-tests rely on this sequence. I do not recommend to rely on it, as a relation is defined by id, type and target only, whereas the position in the according _rels is not part of the information a PackageRelationship is defined of, in terms of the OPC.

Among the changes done to POIXMLDocumentPart concerning relation-handling, the behavior of the method "createRelationship" was slightly altered. So far, it always added the new part to the "relations" list of the affected part. But this is only valid, if a new PackageRelationship was created between the new part and the current part. If this step was skipped, adding the new part to the relations is wrong. And closing the package and reload it will not load the new part into the relations list again, if no PackageRelationship exists between these parts.

Finally, in order to write test all new methods, I moved TestXWPFDocument into the org.apache.poi.xwpf.usermodel package. This allows to test methods with default-visibility, which seemed quite reasonable for a testcase.
Comment 1 Stefan Stern 2011-05-25 13:05:13 UTC
Created attachment 27062 [details]
Patch to all POI-classes affected

This patch modifies all POI-OOXML-classes affected by the changes explained in the description. Some formatting issues have been fixed, as discussed in the developers mailing list. Not all classes were re-formatted.
Comment 2 Stefan Stern 2011-05-25 13:06:18 UTC
Created attachment 27063 [details]
Patch to all POI-Testcases, affected by the changes.

This patches all testcases (and adds a new one), that needed an update due to changes to the POI-OPC and OOXML classes.
Comment 3 Stefan Stern 2011-05-25 13:14:33 UTC
I created some more test-files and zipped them up, but the ZIP is too big for an attachement. Please find the test-files here:
http://www.mind8.com/apache/bug51265.7z
Comment 4 Yegor Kozlov 2011-05-29 09:05:21 UTC
Thanks for the patch. 

I would like to wait until 3.8 beta3 is released and then apply it to trunk (after June 6). Your patch touches some rather fundamental things ( POIXMLDocument and its friends) and I would rather not include it in release without thorough testing.  

Yegor
Comment 5 Yegor Kozlov 2011-06-01 12:11:37 UTC
Applied in r1130120.

This is a really good change. I didn't research hadling of pictures in XSSF and XSLF, but suspect that they should use the same logic as XWPFDocument and reuse image parts. If so, we should consider moving this logic to POIXMLDocument. 

> While changing POIXMLDocumentPart, the relation-handling was reworked. Instead
> of keeping a simple list of all related parts, a Map was used, which uses the
> relation-ID as key and the target part as value. This eases the handling of all
> parts, but required some changes to the API (changes to XSSFDrawing have been
> applied). Additionally, the original list was kept in order to represent the
> relation-sequence, as these appear in the _rels XML document and several
> Unit-tests rely on this sequence. 

I changed this part a little bit. There is no need to have both a Map and a List to maintain the collection of POIXMLDocumentParts. I turned relations into a ordered map :
    private Map<String,POIXMLDocumentPart> relations = new LinkedHashMap<String,POIXMLDocumentPart>(); 

The relationsOrder list is not needed and I removed it. 

> I do not recommend to rely on it, as a
> relation is defined by id, type and target only, whereas the position in the
> according _rels is not part of the information a PackageRelationship is defined
> of, in terms of the OPC.
> 

We should not rely on the order of relations when constructing OPC packages, but we *should* preserve the order of relations when reading / saving a a package. This is why we need to store document parts in a ordered collection.

P.S.  please use a standard archive tool (.zip or .gz) instead of 7z.

Regards,
Yegor
Comment 6 Stefan Stern 2011-06-06 06:53:57 UTC
Created attachment 27116 [details]
Add Image-Handling Unit-Tests

It seems that by moving the TestXWPFDocument class into another package, the new content didn't make it into the moved file. Please see the attached patch for the new tests.
The tests take care that each image is added only once, and when adding it a second time, the already related part is used / an already available image part with the same content is related.
Comment 7 Stefan Stern 2011-06-06 06:54:33 UTC
Sorry for the 7Z. We use it for a while here, so I didn't hesitate. Next time, I will create a standard ZIP.
Comment 8 Yegor Kozlov 2011-06-06 07:43:18 UTC
> 
> It seems that by moving the TestXWPFDocument class into another package, the
> new content didn't make it into the moved file. Please see the attached patch
> for the new tests.
> The tests take care that each image is added only once, and when adding it a
> second time, the already related part is used / an already available image part
> with the same content is related.

Are you keeping you trunk up-to-date? The new tests are already in the code. I see them both locally and online at 
https://svn.apache.org/repos/asf/poi/trunk/src/ooxml/testcases/org/apache/poi/xwpf/usermodel/TestXWPFDocument.java

Yegor
Comment 9 Stefan Stern 2011-06-06 08:46:50 UTC
Sorry, my fault. Somehow my local Subversion sandbox stumbled. Merging showed
me the old TestXWPFDocument file as incoming. I moved my local out of the
sandbox and got an incoming file, which had the new content. All fine now.