Bug 56468 - [PATCH] Writing a workbook more than once corrupts the file
Summary: [PATCH] Writing a workbook more than once corrupts the file
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: XSSF (show other bugs)
Version: unspecified
Hardware: PC All
: P2 major (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-28 19:33 UTC by scott.paffrath
Modified: 2014-07-16 21:15 UTC (History)
2 users (show)



Attachments
The corrupt file. (2.87 KB, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
2014-04-28 19:33 UTC, scott.paffrath
Details
This patch clears the data of the PackagePart for docProps/app.xml before saving it. (12.79 KB, patch)
2014-04-30 22:47 UTC, ralph_g
Details | Diff
This patch clears the data of the PackagePart for docProps/app.xml before saving it (484 bytes, patch)
2014-04-30 23:02 UTC, ralph_g
Details | Diff
This patch clears the data of the PackagePart for docProps/app.xml before saving it (857 bytes, patch)
2014-05-01 07:31 UTC, ralph_g
Details | Diff
31577: This patch clears the data of the PackagePart for docProps/app.xml before saving it (532 bytes, patch)
2014-05-01 07:34 UTC, ralph_g
Details | Diff
Patch for core.xml and app.xml (2.73 KB, patch)
2014-05-01 16:15 UTC, ralph_g
Details | Diff
Patch for XmlValueDisconnectedException (1.02 KB, patch)
2014-05-05 22:44 UTC, scott.paffrath
Details | Diff
Patch for XmlValueDisconnectedException (4.34 KB, patch)
2014-05-06 23:11 UTC, scott.paffrath
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description scott.paffrath 2014-04-28 19:33:33 UTC
Created attachment 31567 [details]
The corrupt file.

The second time a file is written the "docProps/core.xml" file is not included in the output and the result is a corrupt Excel file that cannot be opened.

This bug may be related :
https://issues.apache.org/bugzilla/show_bug.cgi?id=55033

I debugged this in 3.9 and confirmed it is still present in 3.10. Windows 7. Being new to the code base I could only track the bug as far as the following code in "saveImpl" of "ZipPackage.java". It executes on the first run but not the second.

...
			// If the core properties part does not exist in the part list,
			// we save it as well
			if (this.getPartsByRelationshipType(PackageRelationshipTypes.CORE_PROPERTIES).size() == 0 &&
                this.getPartsByRelationshipType(PackageRelationshipTypes.CORE_PROPERTIES_ECMA376).size() == 0    ) {
				logger.log(POILogger.DEBUG,"Save core properties part");

				// We have to save the core properties part ...
				new ZipPackagePropertiesMarshaller().marshall(
                        this.packageProperties, zos);
				// ... and to add its relationship ...
				this.relationships.addRelationship(this.packageProperties
						.getPartName().getURI(), TargetMode.INTERNAL,
						PackageRelationshipTypes.CORE_PROPERTIES, null);
				// ... and the content if it has not been added yet.
				if (!this.contentTypeManager
						.isContentTypeRegister(ContentTypes.CORE_PROPERTIES_PART)) {
					this.contentTypeManager.addContentType(
							this.packageProperties.getPartName(),
							ContentTypes.CORE_PROPERTIES_PART);
				}
			}
...

Test code that replicates the error ("test2.xlsx" is corrupt) :

    public static void main(String[] args) throws FileNotFoundException, IOException
    {
        XSSFWorkbook wb = new XSSFWorkbook();
        XSSFSheet sheet = wb.createSheet();
        XSSFRow row = sheet.createRow(0);
        XSSFCell cell = row.createCell(0);
        cell.setCellValue("Hi");
        
        File f = new File("C:\\temp\\test.xlsx");
        try(FileOutputStream fout = new FileOutputStream(f))
        {
            wb.write(fout);
        }
        
        f = new File("C:\\temp\\test2.xlsx");
        try(FileOutputStream fout = new FileOutputStream(f))
        {
            wb.write(fout);
        }
    }
Comment 1 ralph_g 2014-04-30 19:59:51 UTC
The "docProps/app.xml" is also corrupt.

The correct content of the app.xml is written multiple times into the file. If the same workbook is written twice, the the same content is written twice into app.xml like in the attachment. If the workbook is written three times, there is thrice the same content in the app.xml and so on. This results in making app.xml an invalid xml file.

I only tried it with 3.10-Final in Windows.
Comment 2 ralph_g 2014-04-30 22:47:55 UTC
Created attachment 31576 [details]
This patch clears the data of the PackagePart for docProps/app.xml before saving it.

I added a little patch which fixes the duplicate content in the "docProps/app.xml" file. The "docProps/core.xml" file is still missing but at least LibreOffice can read the xlsx files now.

This patch clears the data of the PackagePart for docProps/app.xml before saving it. I am new to POI so I don't know if this patch has unintended effects, but the test for POIXMLProperties has no failures.
Comment 3 ralph_g 2014-04-30 23:02:45 UTC
Created attachment 31577 [details]
This patch clears the data of the PackagePart for docProps/app.xml before saving it

Sorry, wrong file. Now the patch.
Comment 4 ralph_g 2014-05-01 07:31:12 UTC
Created attachment 31578 [details]
This patch clears the data of the PackagePart for docProps/app.xml before saving it
Comment 5 ralph_g 2014-05-01 07:34:15 UTC
Created attachment 31579 [details]
31577: This patch clears the data of the PackagePart for docProps/app.xml before saving it

I'm really sorry for the many comments, but now it seems to work. Added a cleaned up patch.txt.
Comment 6 ralph_g 2014-05-01 16:15:53 UTC
Created attachment 31581 [details]
Patch for core.xml and app.xml

I think I have found two problem which caused the missing core.xml. in the getPartsByRelationshipType(String) in OPCPackage.

The first problem is that the result of getPart(rel) in getPartsByRelationshipType(String) is always added to the resulting ArrayList. Even if getPart(rel) returns null.
So the size is never 0 if the relationshipType was added once, but the core is still not on the part list and will not be written into the zip file a second or third time. 

The second problem is that in the following if block in saveImpl(zipOutputStream) in ZipPackage
if (this.getPartsByRelationshipType(PackageRelationshipTypes.CORE_PROPERTIES).size() == 0 && 
this.getPartsByRelationshipType(PackageRelationshipTypes.CORE_PROPERTIES_ECMA376).size() == 0    ) {
...
}
the core is written to the zip output stream instead of added to the part list.

I have added a new diff.txt which includes the old one. Though I'm also new to the code base I don't know if the patch causes other problems.
Comment 7 scott.paffrath 2014-05-01 23:35:22 UTC
Ralph, thank you for taking the time to do this. I will try out the changes tommorrow or ASAP.
Comment 8 scott.paffrath 2014-05-05 18:16:51 UTC
My initial report indicated that bug 55033 may be related. Please ignore that remark. I cannot even replicate that bug while testing to see if Ralph's patch fixed it.

The patch looks to me like it fixes the problems with the posted code. Thanks again! However, I did discover that adding a line to the test breaks another part.

If you add something like 

...
cell.setCellValue("Hi");
sheet.setRepeatingRows(new CellRangeAddress(0, 0, 0, 0));
...

causes
org.apache.xmlbeans.impl.values.XmlValueDisconnectedException
	at org.apache.xmlbeans.impl.values.XmlObjectBase.check_orphaned(XmlObjectBase.java:1213)
	at org.apache.xmlbeans.impl.values.XmlObjectBase.newCursor(XmlObjectBase.java:243)
	at org.apache.xmlbeans.impl.values.XmlComplexContentImpl.arraySetterHelper(XmlComplexContentImpl.java:1073)
	at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTDefinedNamesImpl.setDefinedNameArray(Unknown Source)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.saveNamedRanges(XSSFWorkbook.java:1288)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.commit(XSSFWorkbook.java:1309)
	at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:322)
	at org.apache.poi.POIXMLDocument.write(POIXMLDocument.java:173)
	at com.myprovident.excel.XLWorkbook.write(XLWorkbook.java:114)


I believe saveNamedRanges recreates the CTDefinedName collection in the underlying CTWorkbook but does not recreate the enveloping XSSFName collection in the enveloping XSSFWorkbook. I found a fix by replacing the code 

names.setDefinedNameArray(nr);
workbook.setDefinedNames(names);

with the code (some of which is based on a snippet from function "onDocumentRead")

names.setDefinedNameArray(nr);
if(workbook.isSetDefinedNames()) {
    workbook.unsetDefinedNames();
}
workbook.setDefinedNames(names);
            
// Re-process the named ranges
namedRanges = new ArrayList<XSSFName>();
if(workbook.isSetDefinedNames()) {
    for(CTDefinedName ctName : workbook.getDefinedNames().getDefinedNameArray()) {
        namedRanges.add(new XSSFName(ctName, this));
    }
}


Does this look reasonable to you? My tests have the combination of your patch and this change working great. Do you think this should be part of the same patch or under a different bug report?
Comment 9 scott.paffrath 2014-05-05 22:44:16 UTC
Created attachment 31594 [details]
Patch for XmlValueDisconnectedException
Comment 10 scott.paffrath 2014-05-06 23:11:10 UTC
Created attachment 31599 [details]
Patch for XmlValueDisconnectedException

This includes a test case.
Comment 11 Andreas Beeker 2014-05-14 21:17:28 UTC
Thank you for the patch.
I've applied it with svn ver r1594721, with some modifications