Created attachment 24803 [details] sheet that exhibits issue >> The EBEE tries to format decimal values before returning them using >> the method formatNumberDateCell. I have some xls (unfortunately I >> can't share) that throw an exception from the DecimalFormat >> constructor: >> java.lang.IllegalArgumentException: Unquoted special character ';' in >> pattern "_(*#,##0_);_(*(#,##0);_(* "-"_);_(@_)" UserModel copes, but because no formatting is applied. Relevant code in ExcelExtractor: case HSSFCell.CELL_TYPE_NUMERIC: // Note - we don't apply any formatting! text.append(cell.getNumericCellValue()); break; However, a date formatted cell (string cell type) with "hh:mm;hh:mm;hh:mm" fails in both. I believe this means conditional cell formatting never works (either not being applied or throwing exception), only static cell formatting. I'm not sure what the above format of the 3- or 4-valued conditional is, though. I'll file a bug for this. Attached is a doc that exhibits the issue and a java class that reproduces.
Created attachment 24804 [details] runs EBEE and EE on xls
Patch (the revision commit numbers may need removing/changing, this was to a private svn repo): Index: src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java =================================================================== --- src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java (revision 99199) +++ src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java (revision 99174) @@ -275,15 +275,8 @@ } // Format as a number - // some format strings contain unescaped ;s which cause - // problems. Since this is only for indexing, we - // can ignore these and just return the unformatted string (hacky?) - try { - DecimalFormat df = new DecimalFormat(formatString); - return df.format(value); - } catch (java.lang.IllegalArgumentException e){ - return String.valueOf(value); - } + DecimalFormat df = new DecimalFormat(formatString); + return df.format(value); } } }
Fixed in r950616. We already had a reference to the usermodel style DataFormatter, we just weren't using it, doh! Now changed to call the normal DataFormatter, so the event based extractor should behave more like the usermodel one does
Also, your time formats like HH:MM;HH:MM;HH:MM are also detected as times now too.