Bug 47282 - The data format string is incorrect for Accounting style
The data format string is incorrect for Accounting style
Status: NEW
Product: POI
Classification: Unclassified
Component: HSSF
3.0
PC Windows XP
: P3 normal (vote)
: ---
Assigned To: POI Developers List
:
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2009-05-28 21:23 UTC by jimmy422
Modified: 2009-06-11 04:45 UTC (History)
0 users



Attachments
This is a sample java program. Create a test.xls having a cell value with Accounting format (1.03 KB, application/octet-stream)
2009-05-28 21:23 UTC, jimmy422
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jimmy422 2009-05-28 21:23:52 UTC
Created attachment 23726 [details]
This is a sample java program. Create a test.xls having a cell value with Accounting format

Hi,

The format string for the $ sign in Accounting format is incorrect.

Please compare the string format given in built-in formats at
http://poi.apache.org/apidocs/index.html?org/apache/poi/hssf/usermodel/HSSFDataFormat.html with index 44 (0x2c)
_($*#,##0.00_);_($*(#,##0.00);_($*\"-\"??_);_(@_)


In Excel, you can check the format string as follows.
> Format -> Cells
> Select Accounting as Category with 2 decimal places and symbol as $ (which is default in Excel 2003)
> Then select Custom and check the first format string given in Type.
_($* #,##0.00_);_($* (#,##0.00);_($* "-"??_);_(@_)

We can notice that there is a space after the "*" literal. Due to this the parsing logic fails at my end which is well defined.

Please correct the format string in the API.

I guess this is in the latest version as well since the documentation has not changed.

Attaching a sample program for reading the format string.


Thanks,
Vijayendra
Comment 1 Josh Micich 2009-06-04 17:44:33 UTC
There clearly is a difference as you have pointed out.

Excel displays:
_($* #,##0.00_);_($* (#,##0.00);_($* "-"??_);_(@_)

new HSSFWorkbook().createDataFormat().getFormat((short)0x2c);
returns:
_($*#,##0.00_);_($*(#,##0.00);_($*"-"??_);_(@_)


However, it is not clear how this causes a problem.  My understanding was that the 'built-in' formats are referenced by their index (in this case 0x2c) and not by their textual representation.

I took a look at Excel's internal representation of built-in format 0x2c and it is different again:
_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)

It has the spaces after the asterisks, but it also has quotes around the currency and backslashes before some round brackets.

I've compared the formats produced in a new file created by Excel, and one created by POI.  They are the same. You can see POI's code for creating format 0x2c on line 1269 of the following:
http://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/hssf/model/Workbook.java?annotate=764203

As far as I can tell, this discrepancy won't cause POI to read or write xls files incorrectly.

Perhaps you have a different use case which relies on the value returned by HSSFDataFormat.getFormat(short).  If so, can you please describe *that* in more detail? This will help write a junit that verifies the need for this code change.
Comment 2 Michael Zalewski 2009-06-05 09:47:59 UTC
There is a difference, and I think POI is wrong.

The format string returned by POI is
_($*#,##0.00_);_($*(#,##0.00);_($*\"-\"??_);_(@_)
which is wrong. Excel will not accept that string if you try to set that value as a custom format in Excel. This format string is syntactically incorrect.

Excel displays
_($* #,##0.00_);_($* (#,##0.00);_($* "-"??_);_(@_)
which is the correct format. Note the space character after the asterisk. The asterisk means use the following character to pad the formatted value, and repeat the pad character enough times to fill the cell. That's why the format string from POI is wrong. The "*#" in the format string would mean use "#" to pad the formatted value of the cell, and format the number with ",##0.00" (which is not defined).

Josh reports seeing the value
_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)
This is again different. The quotes around the $ sign mean that the cell value will always use "$" as the currency symbol. Without the quotes, Excel will use the currency symbol from the current Locale.

I'm not sure how Josh saw this third value of the format string. I see the value of the format string by 
1) Format | Cells, choose Accounting, and leave the other choices at their default values. Click OK
2) Format | Cells, change the selection from Accounting to Custom, and look at what shows in the field labelled Type.
Comment 3 David Fisher 2009-06-05 10:11:27 UTC
I think Micheal is correct. I googled "excel record 0x2c" and got a link to this page from Microsoft:

http://support.microsoft.com/kb/147942

It includes their definition of the formats:

0x00      General
0x01      0
0x02      0.00
0x03      #,##0
0x04      #,##0.00
0x05      ($#,##0_);($#,##0)
0x06      ($#,##0_);[Red]($#,##0)
0x07      ($#,##0.00_);($#,##0.00)
0x08      ($#,##0.00_);[Red]($#,##0.00)
0x09      0%
0x0a      0.00%
0x0b      0.00E+00
0x0c      # ?/?
0x0d      # ??/??
0x0e      m/d/yy
0x0f      d-mmm-yy
0x10      d-mmm
0x11      mmm-yy
0x12      h:mm AM/PM
0x13      h:mm:ss AM/PM
0x14      h:mm
0x15      h:mm:ss
0x16      m/d/yy h:mm
0x25      (#,##0_);(#,##0)
0x26      (#,##0_);[Red](#,##0)
0x27      (#,##0.00_);(#,##0.00)
0x28      (#,##0.00_);[Red](#,##0.00)
0x29      _(* #,##0_);_(* (#,##0);_(* "-"_);_(@_)
0x2a      _($* #,##0_);_($* (#,##0);_($* "-"_);_(@_)
0x2b      _(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(@_)
0x2c      _($* #,##0.00_);_($* (#,##0.00);_($* "-"??_);_(@_)
0x2d      mm:ss
0x2e      [h]:mm:ss
0x2f      mm:ss.0
0x30      ##0.0E+0
0x31      @

Seems authoritative to me. Josh?
Comment 4 Josh Micich 2009-06-08 12:35:07 UTC
Sorry for the ambiguity in my earlier comment - I wasn't suggesting that POI was correct.  POI's external text representation of format 0x2c is missing some spaces, and this should be fixed.  However, I always like to accompany each fix with something (junit/comment) that says "this is what will go wrong without the fix".


I have tried to write some code that will exploit this incorrect value returned by HSSFDataFormat.getFormat(short).  As far as I can tell, the only place in POI that uses this string value is HSSFDataFormat.getFormat(String).  That method just translates the offending format text back to 0x2c, which should be OK.

I guess that Vijayendra's problem involves processing of these format string values *outside* of POI.
Vijayendra, do you have specific details of how POI's text value for data format 0x2c causes problems for you?
Comment 5 Josh Micich 2009-06-08 12:48:40 UTC
(In reply to comment #2)
> 
> The format string returned by POI is
> _($*#,##0.00_);_($*(#,##0.00);_($*\"-\"??_);_(@_)
> which is wrong. Excel will not accept that string if you try to set that value
> as a custom format in Excel. This format string is syntactically incorrect.

It's probably not critical to the resolution of this bug, but I have different observations.

My Excel(2007) accepts that string without complaint, and seems to interpret it OK too.  It's not likely that anyone would want this exact format, but there is no problem with the syntax.  The '*#' means 'right pad with #'.  The comma is then left without a preceding '0' or '#', so it is interpreted as a literal comma instead of a thousands separator.
Comment 6 Josh Micich 2009-06-08 12:52:13 UTC
I have noticed another (perhaps related problem) that POI doesn't have logic for *parsing* the text representations of data formats.
For example, there is currently nothing that relates line 1265 of Workbook (internal representation) to line 118 of BuiltinFormats (Excel GUI representation):
http://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/hssf/model/Workbook.java?annotate=782398
http://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/usermodel/BuiltinFormats.java?annotate=782398
...though these two strings logically refer to the same thing.


If someone tried to create their own custom format string (using the Excel GUI syntax), HSSFDataFormat.getFormat(String) won't convert it to the internal format that Excel seems to need.

The fix for this should be not too difficult.  This page provides a useful description:
http://www.ozgrid.com/Excel/CustomFormats.htm



BTW - the internal representation of data formats is also visible in the output of BiffViewer. For example:
Offset=0x000004E7(1255) recno=51 sid=0x041E size=0x003F(63)
[FORMAT]
    .indexcode       = 0x002C
    .isUnicode       = false
    .formatstring    = _("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)
[/FORMAT]
Comment 7 jimmy422 2009-06-10 04:08:25 UTC
Sorry for replying late ...

The parsing logic is written as per the Excel formats which has the space.

As Michael Zalewski (https://issues.apache.org/bugzilla/show_bug.cgi?id=47282#c2)
pointed correctly, after parsing as per current format string in POI,
it gives the value as #0.00 at my end.

If the space is removed while debugging, it parses the currencies as perfectly.

The standard formats does have space after * and I think POI should follow the same.
Comment 8 Josh Micich 2009-06-10 11:04:26 UTC
(In reply to comment #7)
> The parsing logic is written as per the Excel formats which has the space.

Which parsing logic?  Excel's? POI's? Have you written your own parsing logic?  If so, is it easy enough to share a small part of that code which exposes the bug?



> As Michael Zalewski
> (https://issues.apache.org/bugzilla/show_bug.cgi?id=47282#c2)
> pointed correctly, after parsing as per current format string in POI,
> it gives the value as #0.00 at my end.

'it gives' - which app are you talking about?  Probably not Excel, because Excel would actually give a result of #,0.00.


 
> If the space is removed while debugging, it parses the currencies as perfectly.

I guess you mean 'If the space is *added* while debugging...'



> The standard formats does have space after * and I think POI should follow the
> same.

POI *will* be changed to match Excel.  That was never in question.  Instead, please help us understand the reasoning behind the change.
Comment 9 jimmy422 2009-06-11 04:45:02 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > The parsing logic is written as per the Excel formats which has the space.
> 
> Which parsing logic?  Excel's? POI's? Have you written your own parsing logic? 
> If so, is it easy enough to share a small part of that code which exposes the
> bug?
Vijayendra - The parsing logic is written externally. No Excel's or POI's parsing logic is used.
I cannot disclose the code, since it is complex and huge.

> 
> 
> 
> > As Michael Zalewski
> > (https://issues.apache.org/bugzilla/show_bug.cgi?id=47282#c2)
> > pointed correctly, after parsing as per current format string in POI,
> > it gives the value as #0.00 at my end.
> 
> 'it gives' - which app are you talking about?  Probably not Excel, because
> Excel would actually give a result of #,0.00.
Vijayendra - The custom application
> 
> 
> 
> > If the space is removed while debugging, it parses the currencies as perfectly.
> 
> I guess you mean 'If the space is *added* while debugging...'
Vijayendra - Yeah. Sorry. If space *added*.
> 
> 
> 
> > The standard formats does have space after * and I think POI should follow the
> > same.
> 
> POI *will* be changed to match Excel.  That was never in question.  Instead,
> please help us understand the reasoning behind the change.

To summarize the logic, it parse the format string and tries to collect various formats (represented by custom java classes) in sequence, which later helps in formatting the value.
The logic is written as per the standard formats. So cannot change it for a single value. So while parsing the string, if the space is not found, it makes the formatting messy.