Bug 46643 - Formula parser should encode explicit range operator with tMemFunc
Summary: Formula parser should encode explicit range operator with tMemFunc
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 3.2-FINAL
Hardware: PC Windows XP
: P2 blocker (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-30 16:10 UTC by Robert Pate
Modified: 2009-02-02 16:27 UTC (History)
1 user (show)



Attachments
Template file loaded into POI (24.00 KB, application/vnd.ms-excel)
2009-01-31 08:38 UTC, Robert Pate
Details
Output file generated by POI (26.50 KB, application/vnd.ms-excel)
2009-01-31 08:39 UTC, Robert Pate
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Pate 2009-01-30 16:10:36 UTC
When generating an Excel document from scratch, I am placing the number 1 in cell A1, the number 2 in cell A2, and the formula SUM(A1:A2) in cell A3.  I then shift rows 2 and 3 down and insert another row.  I then put the number 3 in newly emptied cell A2.  After writin gthis file to disk, I open the file and see the numbers and the formula correctly entered but when I change the value in cell A2 (the newly inserted cell) the sum in A4 does not update.  ANd it will not update until I enter cell A4 and hit enter.  Help please.
Comment 1 Robert Pate 2009-01-31 08:38:25 UTC
Created attachment 23210 [details]
Template file loaded into POI
Comment 2 Robert Pate 2009-01-31 08:39:11 UTC
Created attachment 23211 [details]
Output file generated by POI
Comment 3 Robert Pate 2009-01-31 08:53:42 UTC
After looking at this issue, I have discovered that it is the area range reference in the sum formula that seems to be missing the change value events on cells in between the range.  The first and last cells of the range affect the formula as they should but anything in between does not affect the sum.  Attached are example documents and below are better steps to reproduce.

The following are better reproduction steps for this issue:
1. Load the attached template into POI.
2. Find the cell with the string 'sumofprice' and replace it with the formula SUM(A2:A3)
3. Begin inserting rows in between the rows with the 'firstrow' and 'lastrow' tags by shifting the 'lastrow' row and the 'sumofprice' row down by 1.
4. For each row inserted, place numeric values in the 'price' column.
5. After two or three rows, output the file to another file name. 
Comment 4 Nick Burch 2009-02-02 02:12:06 UTC
Are you aware the excel caches formula values, and you need to re-calculate those cached values after altering things in poi?

http://poi.apache.org/spreadsheet/eval.html
Comment 5 Robert Pate 2009-02-02 10:04:47 UTC
I have attempted the re-calculation solution but with no avail.  If you open the output file generated, you will see the issue.  If you change any of the valus from D3 to D23 the sukm formula in D25 will not update but if you change the values in D2 or D24, the sum will update correctly.  The file was generated while using the reevaluate solution.

If I generate the same output but replace the sum formula with a daisy chain of pluses (D2+D3+D4+D5 ..... +D23+D24), the formula works correctly every time, but if I have more then 308 lines to sum, the formula grows too big.

Please help.
Comment 6 Josh Micich 2009-02-02 14:54:42 UTC
There is definitely something funky going on in attachment (id=23211) .  When you open it in Excel it has the behaviour you describe (updating cells within the range does not cause the formula to be automatically updated).  Another observation is that re-entering the formula (F2, <enter>) in Excel fixes the problem.  This is almost always a sign that POI has incorrectly encoded the formula tokens.

BiffViewer shows the formula from the attachment (id=23211) as:
    Ptg[0]=org.apache.poi.hssf.record.formula.Ref3DPtg [sheetIx=0 ! D2]R
    Ptg[1]=org.apache.poi.hssf.record.formula.Ref3DPtg [sheetIx=0 ! D24]R
    Ptg[2]=class org.apache.poi.hssf.record.formula.RangePtg.
    Ptg[3]=org.apache.poi.hssf.record.formula.FuncVarPtg [SUM nArgs=1]V

After re-enterring the formula and saving with Excel this becomes:
    Ptg[0]=org.apache.poi.hssf.record.formula.MemFuncPtg [len=15]R
    Ptg[1]=org.apache.poi.hssf.record.formula.Ref3DPtg [sheetIx=0 ! D2]R
    Ptg[2]=org.apache.poi.hssf.record.formula.Ref3DPtg [sheetIx=0 ! D6]R
    Ptg[3]=class org.apache.poi.hssf.record.formula.RangePtg.
    Ptg[4]=org.apache.poi.hssf.record.formula.AttrPtg [sum ].

I have done a quick test and found that the critical difference is the presence of the token tMemFunc.  If POI is modified to encode this token, Excel opens the file OK and the symptoms are gone.  I have changed the summary to reflect this.






Comment 7 Josh Micich 2009-02-02 14:56:59 UTC
Initially had I thought that the different encoding of SUM (as tfuncVar(SUM) instead of tAttrSum) was causing the problem, but this did not prove to be the case.  Nonetheless, it turns out to be a small change that we should probably do in order to make POI's formula encoding more closely match that of Excel.
Comment 8 Josh Micich 2009-02-02 15:07:01 UTC
Another important detail about these tokens is that the range is encoded as two 3D cell refs with an explicit range operator.  From the instructions you provided, Step 2 is expected to produce token tArea(D2:D3).  The obvious way to get the encoding (tRef3D(D2),tRef3D(D3),tRange) is to instead call:
cell.setCellFormula("SUM(InventoryDetails!D2:InventoryDetails!D3)");

I am assuming that this is much closer to the code you actually executed.

It's possible that there is some *other* bug in POI that caused the formula originally set as "SUM(D2:D3)" to transform into  "SUM(InventoryDetails!D2:InventoryDetails!D3", but this doesn't seem likely to me.  If you *did* call setCellFormula("SUM(D2:D3)") while creating the attachment, that would represent another bug. If so please can you open another bugzilla with exact details of the code that will produce that error. For example:

cell.setCellFormula("SUM(D2:D3)");
// do specific things to cause bug ... shift rows ?
// Check that the formula hasn't changed:
System.out.println(cell.getCellFormula()); 
// anything but "SUM(D2:D3)" would indicate a bug
Comment 9 Josh Micich 2009-02-02 15:16:25 UTC
Fixed in svn r740146

Junit added.
Comment 10 Robert Pate 2009-02-02 15:23:14 UTC
I used the formula reference:
cell.setCellFormula("SUM(InventoryDetails!D2:InventoryDetails!D3)");

I am new to POI and need to fix this in a production environment on the 3.2.FINAL release.  Could you show me how to do this or ppoint me in the right direct (classes to look at, etc)?

Thank you so much for you help.
Comment 11 Josh Micich 2009-02-02 16:26:55 UTC
(In reply to comment #10)
> I used the formula reference:
> cell.setCellFormula("SUM(InventoryDetails!D2:InventoryDetails!D3)");

Cool.  So the other, hypothetical bug doesn't exist :)

> I am new to POI and need to fix this in a production environment on the
> 3.2.FINAL release.  Could you show me how to do this or ppoint me in the right
> direct (classes to look at, etc)?

You have three main choices:

(1) Work Around
If it's easy enough for you to change that line of code to
cell.setCellFormula("SUM(D2:D3)");
then you can use POI 3.2-FINAL as-is, since the problem only occurs with the explicit range operator (which is more or less ':' with a complex RHS)

(2) Wait for the next POI release
Next: 'nightly build' tomorrow, 'betawithin a week or so, or 'final' in perhaps a month.

(3) Patch your local 3.2
The fix was quite small (details below), but I am not 100% sure if all of the other stuff needed to make this work are present in version 3.2.  For example I think bug 44675 is important (it *is* present in 3.1-beta2).  There could be other important fixes things that I am not remembering.


The critical change was on line 347 of this file:
http://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/formula/FormulaParser.java?annotate=729028&pathrev=740146
Delete one line and add three:
-                return new ParseNode(RangePtg.instance, children);
+                ParseNode result = new ParseNode(RangePtg.instance, children);
+                MemFuncPtg memFuncPtg = new MemFuncPtg(result.getEncodedSize());
+                return new ParseNode(memFuncPtg, result);


Of course in 3.2 the line number is different (326): 
http://svn.apache.org/viewvc/poi/tags/REL_3_2_FINAL/src/java/org/apache/poi/ss/formula/FormulaParser.java?annotate=703641







http://svn.apache.org/viewvc/poi/trunk/src/java/org/apache/poi/ss/formula/FormulaParser.java?r1=740146&r2=740145&pathrev=740146

Comment 12 Josh Micich 2009-02-02 16:27:34 UTC
(In reply to comment #7)
Also made POI follow Excel's special case for the encoding of SUM taking a single
argument.  This was done for consistency; it is not critical for fixing any
known bug.

Fixed in svn r740159