Bug 48494 - EventBasedExcelExtractor and ExcelExtractor don't process conditional cell formatting correctly
Summary: EventBasedExcelExtractor and ExcelExtractor don't process conditional cell fo...
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.6-FINAL
Hardware: PC Mac OS X 10.4
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-01-06 09:49 UTC by Phil Varner
Modified: 2010-06-02 13:18 UTC (History)
0 users



Attachments
sheet that exhibits issue (7.00 KB, application/octet-stream)
2010-01-06 09:49 UTC, Phil Varner
Details
runs EBEE and EE on xls (1.76 KB, text/plain)
2010-01-06 09:50 UTC, Phil Varner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Phil Varner 2010-01-06 09:49:49 UTC
Created attachment 24803 [details]
sheet that exhibits issue

>> The EBEE tries to format decimal values before returning them using
>> the method formatNumberDateCell.  I have some xls (unfortunately I
>> can't share) that throw an exception from the DecimalFormat
>> constructor:
>> java.lang.IllegalArgumentException: Unquoted special character ';' in
>> pattern "_(*#,##0_);_(*(#,##0);_(* "-"_);_(@_)"


UserModel copes, but because no formatting is applied. Relevant code
in ExcelExtractor:

case HSSFCell.CELL_TYPE_NUMERIC:
 // Note - we don't apply any formatting!
 text.append(cell.getNumericCellValue());
 break;

However, a date formatted cell (string cell type) with
"hh:mm;hh:mm;hh:mm" fails in both.

I believe this means conditional cell formatting never works (either
not being applied or throwing exception), only static cell formatting.
 I'm not sure what the above format of the 3- or 4-valued conditional
is, though.

I'll file a bug for this. Attached is a doc that exhibits the issue
and a java class that reproduces.
Comment 1 Phil Varner 2010-01-06 09:50:55 UTC
Created attachment 24804 [details]
runs EBEE and EE on xls
Comment 2 Phil Varner 2010-03-05 06:06:13 UTC
Patch (the revision commit numbers may need removing/changing, this was to a private svn repo):

Index: src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java
===================================================================
--- src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java        (revision 99199)
+++ src/java/org/apache/poi/hssf/extractor/EventBasedExcelExtractor.java        (revision 99174)
@@ -275,15 +275,8 @@
                        }
 
                        // Format as a number
-                       // some format strings contain unescaped ;s which cause
-                       // problems.  Since this is only for indexing, we
-                       // can ignore these and just return the unformatted string (hacky?)
-                       try {
-                               DecimalFormat df = new DecimalFormat(formatString);
-                               return df.format(value);
-                       } catch (java.lang.IllegalArgumentException e){
-                               return String.valueOf(value);
-                       }
+                       DecimalFormat df = new DecimalFormat(formatString);
+                       return df.format(value);
                }
        }
 }
Comment 3 Nick Burch 2010-06-02 12:13:53 UTC
Fixed in r950616.

We already had a reference to the usermodel style DataFormatter, we just weren't using it, doh! Now changed to call the normal DataFormatter, so the event based extractor should behave more like the usermodel one does
Comment 4 Nick Burch 2010-06-02 13:18:16 UTC
Also, your time formats like HH:MM;HH:MM;HH:MM are also detected as times now too.