Created attachment 31567 [details] The corrupt file. The second time a file is written the "docProps/core.xml" file is not included in the output and the result is a corrupt Excel file that cannot be opened. This bug may be related : https://issues.apache.org/bugzilla/show_bug.cgi?id=55033 I debugged this in 3.9 and confirmed it is still present in 3.10. Windows 7. Being new to the code base I could only track the bug as far as the following code in "saveImpl" of "ZipPackage.java". It executes on the first run but not the second. ... // If the core properties part does not exist in the part list, // we save it as well if (this.getPartsByRelationshipType(PackageRelationshipTypes.CORE_PROPERTIES).size() == 0 && this.getPartsByRelationshipType(PackageRelationshipTypes.CORE_PROPERTIES_ECMA376).size() == 0 ) { logger.log(POILogger.DEBUG,"Save core properties part"); // We have to save the core properties part ... new ZipPackagePropertiesMarshaller().marshall( this.packageProperties, zos); // ... and to add its relationship ... this.relationships.addRelationship(this.packageProperties .getPartName().getURI(), TargetMode.INTERNAL, PackageRelationshipTypes.CORE_PROPERTIES, null); // ... and the content if it has not been added yet. if (!this.contentTypeManager .isContentTypeRegister(ContentTypes.CORE_PROPERTIES_PART)) { this.contentTypeManager.addContentType( this.packageProperties.getPartName(), ContentTypes.CORE_PROPERTIES_PART); } } ... Test code that replicates the error ("test2.xlsx" is corrupt) : public static void main(String[] args) throws FileNotFoundException, IOException { XSSFWorkbook wb = new XSSFWorkbook(); XSSFSheet sheet = wb.createSheet(); XSSFRow row = sheet.createRow(0); XSSFCell cell = row.createCell(0); cell.setCellValue("Hi"); File f = new File("C:\\temp\\test.xlsx"); try(FileOutputStream fout = new FileOutputStream(f)) { wb.write(fout); } f = new File("C:\\temp\\test2.xlsx"); try(FileOutputStream fout = new FileOutputStream(f)) { wb.write(fout); } }
The "docProps/app.xml" is also corrupt. The correct content of the app.xml is written multiple times into the file. If the same workbook is written twice, the the same content is written twice into app.xml like in the attachment. If the workbook is written three times, there is thrice the same content in the app.xml and so on. This results in making app.xml an invalid xml file. I only tried it with 3.10-Final in Windows.
Created attachment 31576 [details] This patch clears the data of the PackagePart for docProps/app.xml before saving it. I added a little patch which fixes the duplicate content in the "docProps/app.xml" file. The "docProps/core.xml" file is still missing but at least LibreOffice can read the xlsx files now. This patch clears the data of the PackagePart for docProps/app.xml before saving it. I am new to POI so I don't know if this patch has unintended effects, but the test for POIXMLProperties has no failures.
Created attachment 31577 [details] This patch clears the data of the PackagePart for docProps/app.xml before saving it Sorry, wrong file. Now the patch.
Created attachment 31578 [details] This patch clears the data of the PackagePart for docProps/app.xml before saving it
Created attachment 31579 [details] 31577: This patch clears the data of the PackagePart for docProps/app.xml before saving it I'm really sorry for the many comments, but now it seems to work. Added a cleaned up patch.txt.
Created attachment 31581 [details] Patch for core.xml and app.xml I think I have found two problem which caused the missing core.xml. in the getPartsByRelationshipType(String) in OPCPackage. The first problem is that the result of getPart(rel) in getPartsByRelationshipType(String) is always added to the resulting ArrayList. Even if getPart(rel) returns null. So the size is never 0 if the relationshipType was added once, but the core is still not on the part list and will not be written into the zip file a second or third time. The second problem is that in the following if block in saveImpl(zipOutputStream) in ZipPackage if (this.getPartsByRelationshipType(PackageRelationshipTypes.CORE_PROPERTIES).size() == 0 && this.getPartsByRelationshipType(PackageRelationshipTypes.CORE_PROPERTIES_ECMA376).size() == 0 ) { ... } the core is written to the zip output stream instead of added to the part list. I have added a new diff.txt which includes the old one. Though I'm also new to the code base I don't know if the patch causes other problems.
Ralph, thank you for taking the time to do this. I will try out the changes tommorrow or ASAP.
My initial report indicated that bug 55033 may be related. Please ignore that remark. I cannot even replicate that bug while testing to see if Ralph's patch fixed it. The patch looks to me like it fixes the problems with the posted code. Thanks again! However, I did discover that adding a line to the test breaks another part. If you add something like ... cell.setCellValue("Hi"); sheet.setRepeatingRows(new CellRangeAddress(0, 0, 0, 0)); ... causes org.apache.xmlbeans.impl.values.XmlValueDisconnectedException at org.apache.xmlbeans.impl.values.XmlObjectBase.check_orphaned(XmlObjectBase.java:1213) at org.apache.xmlbeans.impl.values.XmlObjectBase.newCursor(XmlObjectBase.java:243) at org.apache.xmlbeans.impl.values.XmlComplexContentImpl.arraySetterHelper(XmlComplexContentImpl.java:1073) at org.openxmlformats.schemas.spreadsheetml.x2006.main.impl.CTDefinedNamesImpl.setDefinedNameArray(Unknown Source) at org.apache.poi.xssf.usermodel.XSSFWorkbook.saveNamedRanges(XSSFWorkbook.java:1288) at org.apache.poi.xssf.usermodel.XSSFWorkbook.commit(XSSFWorkbook.java:1309) at org.apache.poi.POIXMLDocumentPart.onSave(POIXMLDocumentPart.java:322) at org.apache.poi.POIXMLDocument.write(POIXMLDocument.java:173) at com.myprovident.excel.XLWorkbook.write(XLWorkbook.java:114) I believe saveNamedRanges recreates the CTDefinedName collection in the underlying CTWorkbook but does not recreate the enveloping XSSFName collection in the enveloping XSSFWorkbook. I found a fix by replacing the code names.setDefinedNameArray(nr); workbook.setDefinedNames(names); with the code (some of which is based on a snippet from function "onDocumentRead") names.setDefinedNameArray(nr); if(workbook.isSetDefinedNames()) { workbook.unsetDefinedNames(); } workbook.setDefinedNames(names); // Re-process the named ranges namedRanges = new ArrayList<XSSFName>(); if(workbook.isSetDefinedNames()) { for(CTDefinedName ctName : workbook.getDefinedNames().getDefinedNameArray()) { namedRanges.add(new XSSFName(ctName, this)); } } Does this look reasonable to you? My tests have the combination of your patch and this change working great. Do you think this should be part of the same patch or under a different bug report?
Created attachment 31594 [details] Patch for XmlValueDisconnectedException
Created attachment 31599 [details] Patch for XmlValueDisconnectedException This includes a test case.
Thank you for the patch. I've applied it with svn ver r1594721, with some modifications