Bug 50841 - DataFormatter, DateUtil, and ExcelStyleDateFormatter issues
Summary: DataFormatter, DateUtil, and ExcelStyleDateFormatter issues
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: POI Overall (show other bugs)
Version: 3.7-FINAL
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-28 16:06 UTC by Robert Kish
Modified: 2011-03-22 10:26 UTC (History)
1 user (show)



Attachments
Modified versions of files with my changes in them and description of changes (15.75 KB, application/zip)
2011-02-28 16:06 UTC, Robert Kish
Details
Modified versions of files with my changes in them and description of changes (15.79 KB, application/zip)
2011-02-28 17:27 UTC, Robert Kish
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Kish 2011-02-28 16:06:56 UTC
Created attachment 26697 [details]
Modified versions of files with my changes in them and description of changes

I sent in item 48872 last year and did not get a chance to look into it until recently. Nick Burch worked on my previous comments and did a great job integrating changes into the libraries. I have a new round of bug fixes and improvements I'd like to "share". 

1. Added an option to support text extraction similar to Excel's CSV with regards to spacing, invalid dates, scientific notation + sign.
2. Corrected bug with getting format for values = 0 and format expression in 4 parts.
3. Better elapsed time processing, now handling [h], [m], and [s] and more correct rounding.
4. Milliseconds support: s.0, s.00, and s.000.

I have attached 4 files,
	a: modified version of org.apache.poi.ss.usermodel.DataFormatter called NcoDataFormatter.java
	b: modified version of org.apache.poi.ss.usermodel.ExcelStyleDateFormatter called NcoExcelStyleDateFormatter.java
	c: modified version of org.apache.poi.ss.usermodel.TestDataFormatter.java called NcoTestDataFormatter.java
	d: Details of what I changed in Differences.txt
This was done in this way as it's an easy way for me to have access to the original POI code and access to my "fixed" version of the code, so I can use both at the same time, if needed. Please use a diff tool to compare the code between the 3.7 version and mine to see the exact details of code changes.
Comment 1 Robert Kish 2011-02-28 17:27:13 UTC
Created attachment 26698 [details]
Modified versions of files with my changes in them and description of changes
Comment 2 Yegor Kozlov 2011-03-14 09:04:18 UTC
I'm reviewing your patch. My plan is to create a subclass of DataFormatter and encapsulate CSV-specific logic in it. NcoDataFormatter should share common stuff, not duplicate it. 

Fixes in date detection and formatting (items 8 - 11 from differences.txt) seem to be generic and I'm going to apply them directly to DateUtil and ExcelStyleDateFormatter. 

To support elapsed time you proposed to change 

	Pattern date_ptrn2 = Pattern.compile("^\\[[a-zA-Z]+\\]");
	to
	Pattern date_ptrn2 = Pattern.compile("^\\[[A-Z]+\\]");

(item 8 from differences.txt).

This "loosened" regex will skip formats with a prefixed color, e.g. [yellow]yyyy-mm-dd or [red][hh] 

I'm going to define a fourth regexp for the elapsed time patterns:

    //  elapsed time patterns: [h],[m] and [s]
    private static final Pattern date_ptrn4 = Pattern.compile("^\\[([hH]+|[mM]+|[sS]+)\\]");

and use it in combination with date_ptrn2. 

Other than that, very cool! Thank you for the patch.

Yegor
Comment 3 Yegor Kozlov 2011-03-19 08:44:11 UTC
Applied in r1083173

I changed my mind about subclassing DataFormatter, this class was not designed for extension and it is much easier to add a new constructor to pass the emulateCsv parameter. You should be able to run your code by replacing  NcoDataFormatter with DataFormatter.

 Changes that apply to DataFormatter by default:

 - Support for elapsed time and [h], [m], and [s].
 - Rounding style for elapsed time is DOWN and uses float for calculations instead of double.
 - support for scientific notation


 Changes that apply only when emulateCsv=true:

 - returned values are not trimmed and format spacers and respected
 - Invalid dates are formatted as 255 pound signs ("#")
 - simulate Excel's handling of a format string of all # when the value is 0.

Note that I changed the way how you propagate invalid formats using the NDFNoNumbers exception and return a special subclass of Format instead.
Exceptions should be used only for exceptional conditions and not for normal control flow.

Regards,
Yegor
Comment 4 Robert Kish 2011-03-22 10:26:21 UTC
Yegor,
I just spent some time performing a diff of my submission to your work. I am quite impressed with your implementation of my rough points. You implemented them much cleaner than I did, especially with regards to the exception as flow control and the elapsed time formatters.

Thanks again.