Bug 44410

Summary: Single letter data ranges (such as "C:C") do not 'expand' the same way as Excel expands them.
Product: POI Reporter: David Webster <dave.webster>
Component: HSSFAssignee: POI Developers List <dev>
Status: RESOLVED FIXED    
Severity: normal CC: cpuidle
Priority: P3    
Version: 3.0-dev   
Target Milestone: ---   
Hardware: Other   
OS: Windows XP   
Attachments: Excel file showing INDEX and SUM functions failing with single character data range

Description David Webster 2008-02-13 04:08:09 UTC
Hello All, 

Description
-----------
If a formula in Excel specifies a data range involving single letters only (i.e.
"C:C" meaning "all of column D"), when parsed in POI, this data range is not
expanded properly.  But, if the data range is specified as
LetterNumber:LetterNumber, all works fine with POI.

Steps to Reproduce
------------------
1/
Create a simple spreadsheet as follows
     A    B    C
   ------------------
1 |    |    | 1  |
2 |    |    | 2  |
3 |    |    | 3  |
4 | ** |    |    |
5 | %% |    |    |

** contains formula =INDEX(C:C,2,1)
%% contains formula =SUM(C:C)

2/
Notice in Excel (2003), cell A4 evaluates to 2 and Cell A5 evaluates to 6.

3/
Parse the Excel file using POI to evaulate cell formulas (i.e. using
HSSFFormulaEvaluator.CellValue cellValue = evaluator.evaluate(cell); for example)

4/
Output for cell A4 shows an exception, output for A5 shows 0.

Actual Results
--------------
As stated in 4/ above.
Exception is:

java.lang.ArrayIndexOutOfBoundsException: 1
        at
org.apache.poi.hssf.record.formula.functions.Index.evaluate(Index.java:83)
        at
org.apache.poi.hssf.record.formula.eval.FuncVarEval.evaluate(FuncVarEval.java:43)
        at
org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.internalEvaluate(HSSFFormulaEvaluator.java:395)
        at
org.apache.poi.hssf.usermodel.HSSFFormulaEvaluator.evaluate(HSSFFormulaEvaluator.java:203)
        at ExcelParser2.getCellValue(ExcelParser2.java:71)
        at ExcelParser2.printWorkSheetTable(ExcelParser2.java:261)
        at ExcelParser2.main(ExcelParser2.java:293)


Expected Results
----------------
Same as Excel as stated in 2/ above.

Build Date & Platform
---------------------
2008-02-13 on Windows XP

Additional Information
----------------------

In HSSFFormulaEvaluator.java, method internalEvaluate(), it seems like when
evaluating "C:C", it is translated into "$C$1:$C$0".

For instance, 
Line 422 (for area and 2d area eval) and Line 439 (3d area eval) both show local
variable "row1" equalling -1 when evaulating, so the subsequent looping through
the rows and columns does not execute, hence the 'values' array is never filled.

Obviously different formulas (INDEX, SUM) handle this unexpected empty values
array differently ... 

The same exception occurs for the INDEX formula with ranges such as
"=INDEX(B:C)" (which works okay in Excel 2003).

At lines HSSFFormulaEvaluator.java:422 and HSSFFormulaEvaluator.java:439 I tried
adding the code: 

// 422:
// fix for evaluating range "D:D" which returns lastRow() of -1 instead of max row
if ( row1 == -1 )
{
  ap.setLastRow((short) (sheet.getLastRowNum()+1));
  row1 = ap.getLastRow();
}

// 439:
// fix for evaluating range "D:D" which returns lastRow() of -1 instead of max row
if ( row1 == -1 )
{
  a3dp.setLastRow((short) (sheet.getLastRowNum()+1));
  row1 = a3dp.getLastRow();
}

but that only made the INDEX() function work and the SUM() one fail with an
Exception - I clearly don't know conceptually where the change should go. As
mentioned above, using a fully specified data range such as "C1:C3" works fine
with both INDEX() and SUM() functions.

Anyways, any help or insights appreciated.

Much thanks as always,

Dave
Comment 1 David Webster 2008-02-13 04:12:22 UTC
Created attachment 21516 [details]
Excel file showing INDEX and SUM functions failing with single character data range
Comment 2 Nick Burch 2008-02-15 05:50:43 UTC
Fixed in SVN. Now not only does the formula evaluator handle these properly,
calling cell.getFormulaString() will also contain the correct formula