I am using the POI 3.8 beta 5 (from my own build on 10/06) on mainframe to read Excel files. Reading/Writing xls file is OK. I am getting the following stack trace when reading xlsx files. Exception in thread "main" org.apache.poi.openxml4j.exceptions.OpenXML4JRuntimeException: Package.init() : this exception should never happen, if you read this message please send a mail to the developers team. : The specified content type 'application/vnd.openxmlformats-package.core-properties+xml' is not compliant with RFC 2616: malformed content type. at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:166) at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:82) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:228) at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:67) at TestWorkbookFactoryCreate.main(TestWorkbookFactoryCreate.java:16) Here is the output of "java -version". java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pmz31dev-20090707 (SR10 )) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 z/OS s390-31 j9vmmz3123-20090707 (JIT enabled) J9VM - 20090706_38445_bHdSMr JIT - 20090623_1334_r8 GC - 200906_09) JCL - 20090705 Output of "uname -a" OS/390 ABIZOS08 21.00 03 2818 Test code import org.apache.poi.ss.usermodel.*; import org.apache.poi.xssf.usermodel.*; import java.io.FileInputStream; import java.io.IOException; public class TestWorkbookFactoryCreate { public static void main(String[] args) throws IOException, Exception { FileInputStream fileIn = null; try { fileIn = new FileInputStream("utf8.xlsx"); XSSFWorkbook wb = (XSSFWorkbook) WorkbookFactory.create(fileIn); System.out.println("Workbook created"); } finally { if (fileIn != null) fileIn.close(); } } }
Could you please attach the problematic file too? Also, do you know how the file was generated?
Created attachment 27970 [details] Test xlsx file
Any xlsx file created by Excel 2007 has this problem. I have attached a sample file.
I did more testing on this on mainframe and figured out that I have to pass the -Dfile.encoding=utf-8 option. $ java -Dfile.encoding=UTF-8 TestWorkbookFactoryCreate Workbook created $ java TestWorkbookFactoryCreate Exception in thread "main" org.apache.poi.openxml4j.exceptions.OpenXML4JRuntimeException: Package.init() : this exception should never happen, if you read this message please send a mail to the developers team. : The specified content type 'application/vnd.openxmlformats-package.core-properties+xml' is not compliant with RFC 2616: malformed content type. at org.apache.poi.openxml4j.opc.OPCPackage.init(OPCPackage.java:166) at org.apache.poi.openxml4j.opc.OPCPackage.<init>(OPCPackage.java:141) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:82) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:228) at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:67) at TestWorkbookFactoryCreate.main(TestWorkbookFactoryCreate.java:16) Therefore, the -Dfile.encoding=utf-8 solves my problem. The default encoding in mainframe is ebcdic, and I have to use utf-8. I sent this as a poi bug earlier because the error message said so.
Hmm, we must have an encoding assumption in the OPC code somewhere then The odd thing is that that error message is coming from the ContentType class, which does hard code the encoding to US-ASCII, so I'm not sure where the issue is
I hope to get this working without passing passing the -Dfile.encoding=UTF-8 option when calling java.
If you're able to, fire up your JVM with remote debugging enabled, and attach a remote debugger (eg eclipse) to it. Then, step through the problem code, and see if you can work out what is incorrectly encoded that's breaking. (Nothing springs to mind as wrong from looking at the source code, so it's likely something subtle)
Hello, We are using the POI API (stable 3.8) on a system running ibm500 encoding as default encoding. So we got the same error, when trying to create a Workbook using WorkbookFactory.create(ByteArrayInputStream bais). We found that the problem lies in the method org.apache.poi.openxml4j.opc.internal.ContentType.ContentType(String contentType) In line 139, the follwoing code is called: contentTypeASCII = new String(contentType.getBytes(), "US-ASCII"); The String.getBytes() causes the system to return the bytes in default system encoding (for instance ibm500). Afterwards this should be converted using encoding US-ASCII. This cannot work. So, we wonder, why this conversion will be done? We deleted the line and just put following code: contentTypeASCII = contentType; Afterwards it worked fine. Regards Constantin
It is very likely that your hypothesis is correct and this oine of code can cause problems. The problematic piece of code exists since POI-3.5, when OpenXml4j was contributed to Apache POI. I guess the intention was to ensure that the string being parsed and validated is in the ASCII encoding. This "worked" for years but the conversion does not make sense because if the input argument contains characters above ASCII then they are converted to 0XFFFD ("not a character" unicode) and the subsequent validation against the patternMediaType regex fails. Consider the following examples: (a) new ContentType("text/\u007E") (b) new ContentType("text/\u0080") The first case (a) works because all characters in the input string are in ASCII and the conversion does not change the input string. The second case (b) fails no matter if the input argument is re-converted to US-ASCII or not. If you apply your fix (contentTypeASCII=contentType) then the regex check at line 146 fails. Current code first converts the input string to "text/\uFFFD" and then the regex fails. So I agree that this conversion is extra and can be removed. The fix is coming soon. Regards, Yegor (In reply to comment #8) > Hello, > > We are using the POI API (stable 3.8) on a system running ibm500 encoding as > default encoding. > So we got the same error, when trying to create a Workbook using > WorkbookFactory.create(ByteArrayInputStream bais). > > We found that the problem lies in the method > org.apache.poi.openxml4j.opc.internal.ContentType.ContentType(String > contentType) > > In line 139, the follwoing code is called: > contentTypeASCII = new String(contentType.getBytes(), "US-ASCII"); > > The String.getBytes() causes the system to return the bytes in default > system encoding (for instance ibm500). Afterwards this should be converted > using encoding US-ASCII. This cannot work. > > So, we wonder, why this conversion will be done? > > We deleted the line and just put following code: > contentTypeASCII = contentType; > > Afterwards it worked fine. > > Regards > Constantin
Should be fixed in r1394001. Yegor