Bug 34828 - [PATCH] FormulaEvaluator Partial Implementation
Summary: [PATCH] FormulaEvaluator Partial Implementation
Status: RESOLVED FIXED
Alias: None
Product: POI
Classification: Unclassified
Component: HSSF (show other bugs)
Version: 2.5-FINAL
Hardware: Other other
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-10 00:17 UTC by Amol Deshmukh
Modified: 2005-05-19 06:16 UTC (History)
1 user (show)



Attachments
PATCH on HEAD for FormulaEvaluator functionality (143.92 KB, patch)
2005-05-10 00:18 UTC, Amol Deshmukh
Details | Diff
[PATCH] Overrides previous patch, default impl of all function classes (484.94 KB, patch)
2005-05-10 05:36 UTC, Amol Deshmukh
Details | Diff
PATCH file zipped, several bugs in core fixed (30.35 KB, patch)
2005-05-11 06:02 UTC, Amol Deshmukh
Details | Diff
HSSF User's Guide to the FormulaEvaluator (47.00 KB, application/octet-stream)
2005-05-12 18:29 UTC, Amol Deshmukh
Details
HSSF User's Guide to the FormulaEvaluator (4.57 KB, application/octet-stream)
2005-05-12 18:31 UTC, Amol Deshmukh
Details
HSSF Contributors Guide to the FormulaEvaluator (9.62 KB, application/octet-stream)
2005-05-12 18:33 UTC, Amol Deshmukh
Details
Sheet for test data (15.70 KB, application/octet-stream)
2005-05-12 18:36 UTC, Amol Deshmukh
Details
Stable patch with updated xdocs included in patch (40.11 KB, patch)
2005-05-16 09:10 UTC, Amol Deshmukh
Details | Diff
Test data for automated tests (24.44 KB, application/octet-stream)
2005-05-16 09:12 UTC, Amol Deshmukh
Details
Patch file (not zipped) (497.32 KB, patch)
2005-05-17 14:46 UTC, Amol Deshmukh
Details | Diff
Patch that works from command line CVS (482.33 KB, patch)
2005-05-17 23:53 UTC, Amol Deshmukh
Details | Diff
Zipped FormulaEvaluator xdocs (eval and eval-devguide) (6.51 KB, application/octet-stream)
2005-05-18 18:32 UTC, Amol Deshmukh
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Amol Deshmukh 2005-05-10 00:17:29 UTC
This is the preview of the HSSFFormulaEvaluator. The design is modular, so
anybody should be able to contribute the Excel function library implementation.
Currently about 20-30 functions are implemented (admittedly buggy).
Comments invited. 

README.txt follows:
*** WARNING: This is a work in progress  :) ***



*********************** TESTING *********************
Quick Test:
No JUnit tests are provided as of now. For a quick test (and guaranteed results)
use eclipse!
After applying patch to HEAD, go to 
                   xxxxxx.hssf.usermodel.HSSFFormulaEvaluator
This class has a main method so it can be run from eclipse.
Be sure you adjust the file path that is hard coded in the main method to point
to the included
test excel.

The HSSFFormulaEvaluator interface is not complete so as of now there is only
one method that
you can use in case you want to do custom testing: 
                   evaluateToString(formula, sheet, wb)



****************** BLAH *******************

Basic description:
1. Take the cell formula string, parse it into RPN tokens using FormulaParser.
2. For each RPN token:
   If it is operation token, pop the numberOfOperands reqd by this operation 
     Create approproate OperationEval instance and perform the operation by
calling "evaluate"
   If it is an AreaPtg token, evaluate every cell in the area, create 1D Array
of values, put values there
     For area tokens, use AreaEval to store the reference to AreaPtg and the
array of values.
     Push AreaEval.
   If it is ReferencePtg token, evaluate it, put the ReferencePtg and the value
in RefEval. 
     Push RefEval.
   Else, it is one of IntPtg, NumberPtg, StringPtg, BoolPtg -> these are pushed on 
     Push a ValueEval
3. Eventually there will be just one token on stack (if all goes well).
   This can either be NumericEval, StringEval or RefEval.
   for RefEval, one more step is required to extract the inner ValueEval
4. Done.


****************** BLAH BLAH *******************

File/Class organization:
0. Formulatokens are mapped to Eval impls
1. Base interface for all Evals is.... "Eval" !!!
2. This is extended as follows:
   
   ValueEval extends Eval: For value elements (ie. not operations)
   OperationEval extends Eval: For operation elements
   
   NumericValueEval extends ValueEval: For value elements that can be directly
evaluated as numbers
   StringValueEval extends ValueEval: For value elements that can be directly
evaluated as strings
   ErrorEval extends ValueEval: Error values
   
   AddEval, SubtractEval etc. implement OperationEval: Individual operations
   FunctionEval implements OperationEval: For special operations that are functions
   
   FuncVarEval extends FunctionEval: Operations that are functions.
     Note: this is because I dont know what FuncPtg does. In case I need it, I
will extend FuncPtg from FunctionEval. 
     Hence I have FunctionEval
   
   *** functions ***
   functions are in package: xxxxx.hssf.record.formula.functions
   Base interface is Function
   DefaultFunctionImpl extends Function: For functions that are not yet implemented.
     default behavious is to return ErrorEval (resutls in the entire value being
an error of type FUNCTION_NOT_IMPLEMENTED.
   Individual function classes extend from DefaultFunctionImpl.
Comment 1 Amol Deshmukh 2005-05-10 00:18:25 UTC
Created attachment 14977 [details]
PATCH on HEAD for FormulaEvaluator functionality
Comment 2 Amol Deshmukh 2005-05-10 05:36:31 UTC
Created attachment 14979 [details]
[PATCH] Overrides previous patch, default impl of all function classes

This patch has all of previous patch and:
1. Impl of all core classes in xxxxx.function.eval is now complete
2. Default impl for *all* function classes is added
   (defalt impl is to return an error value)
Comment 3 Avik Sengupta 2005-05-10 13:37:33 UTC
This is good! I like it:) Thanks Amol.

A couple of q's

1. I realise you're going thru hoops to decouple this, thanks.. i'm sure this
will turn out to be a good decision in the long run. However, I was wondering if
making the Evals extend Ptgs may not be a better option than delegating. Will it
save some typing, and get rid of the ugly  MAP? Alternately, if  delegating,
does it makes sense to implement the delegate attribute and the common methods
(such as getNumberofOperands) in an abstract base class? I must admit I havent
thought these thru  completely. 

2. Should the function implementations be in one big class as individual
methods, or as separate classes (which in any case have only one method)? What
is easier to code/maintain?

3. This will need enormous testing before we expect this to be used in
production. We will essentially need to test complete excel math. So people, if
you are interested in this feature in POI, writing tests for this is the
simplest way to contribute. Also, we need unit tests first, rather than
functional tests.. ie, test the POI functionality in isolation, without reading
and writing excel files. Functional tests will necessarily be less in number. 

4. Amol, could you teach eclipse not to add default comments, and remove the
ones that are currently there. It just adds noise. the "non-javadoc @see ..."
stuff... 

Once again, amol, great piece of work.


Comment 4 Amol Deshmukh 2005-05-10 16:43:00 UTC
**** **** **** **** **** **** **** **** ****  **** 
**** Note: I goofed up a bit on the second    **** 
**** version of the patch #14979              **** 
**** As a result, you may have to apply the   **** 
**** patch to /src/scratchpad/src instead of  **** 
**** the root folder                          ****
**** **** **** **** **** **** **** **** ****  **** 

Avik, Thanks for the encouraging response

1. Ptg and Eval: Delegating or extending?
Initially I had an impl that added hooks into *Ptgs to get/store values.
But with input from you and Andy I realised that integrating tightly with Ptgs
would mean that the code could not go in until it was completely tested (read a
long long time :)

IMHO, having Evals separate from Ptg is not too bad since there are only so many
*Evals and  all the important Evals have been covered. Delegation has the
advantage (in this case atleast) of decoupling FormulaParser from
FormulaEvaluator. But I guess extending from Ptg could also work...

2. Functions in one class or one class per function?
Honestly, I did not give this much thought when I designed - I just assumed one
class per function was superior. Thinking about it now, it seems like multiple
functions in one class is an alternative - though wrt maintainability i think it
would be easier to test-develop-maintain individual classes than one big class.
Some of the function implementations themselves can be quite big themselves, so
one huge class would be /really/ huge. Also one function per class should make
it easier to organize unit tests.
Unless one class per function causes problems, I'd vote for one class per function.

3. Testing...
Unit testing would be a big effort. Almost as big as writing individual
functions, a distributed effort would be great!

4. My eclipse...
Yeah, sorry about that. I have two different versions of eclipse, I still havent
taught one of them not to do the default comments :) 

5. Under contruction:
The work on core eval classes is not yet "complete". I think there is need for a
BlankEval class to handle empty cells - which are currently being handled as
StringEval("").
Comment 5 Amol Deshmukh 2005-05-11 06:02:35 UTC
Created attachment 14992 [details]
PATCH file zipped, several bugs in core fixed

BlankEval introduced, Removed eclipse generated comments, Core classes bug
fixes done, RelationalOperators implemented as per excel (not Oo).
Comment 6 Amol Deshmukh 2005-05-12 18:29:05 UTC
Created attachment 15012 [details]
HSSF User's Guide to the FormulaEvaluator

user api guide. comments invited. current patch #14992 does NOT reflect this
user api - next patch will.
Comment 7 Amol Deshmukh 2005-05-12 18:31:19 UTC
Created attachment 15013 [details]
HSSF User's Guide to the FormulaEvaluator

sorry, wrong file uploaded earlier :(
Comment 8 Amol Deshmukh 2005-05-12 18:33:46 UTC
Created attachment 15014 [details]
HSSF Contributors Guide to the FormulaEvaluator

dev guide. Overview of design and info on how to contribute function
implementation. May change. Comments welcome.
Comment 9 Amol Deshmukh 2005-05-12 18:36:49 UTC
Created attachment 15015 [details]
Sheet for test data

Sheet for entering test data. Test code has already been written. See dev guide
for more info on how to use the sheet. Comments invited.
Comment 10 Avik Sengupta 2005-05-12 20:38:58 UTC
This is great, i'll convert it into xdocs tom. In the meanwhile, you can add the
tests in batches if you want. 

Another thought, on the tests... Why do you need separate test methods/classes
for each eval if you are directly reading it from the excel sheet... your public
class TestAddEval/public void testAdd() does not contain any code specific to
the AddEval. All you need is some logic/convention to read multiple rows from
the sheet.. probably an odd/even row num with blank checks should do. That way,
one method (with asserts in a loop) will test all functions/operators. 
Comment 11 Avik Sengupta 2005-05-13 18:03:45 UTC
Documents converted to xdocs (phew!) and comitted. 
Comment 12 xudong 2005-05-14 09:48:29 UTC
Thanks for making such a patch.

I don't know why, but I cannot find any constructor in HSFFormulaEvaluator. 

I am using the patch created in 2005-05-11 06:02. 

What's wrong?
Comment 13 xudong 2005-05-14 09:51:39 UTC
Thanks for making such a patch.

I don't know why, but I cannot find any constructor in HSFFormulaEvaluator. 

I am using the patch created in 2005-05-11 06:02. 

I think I am using a wrong patch. But where can I find the correct one?
Comment 14 Amol Deshmukh 2005-05-16 09:10:18 UTC
Created attachment 15039 [details]
Stable patch with updated xdocs included in patch

This FormulaEvaluator patch contains:
1. All core classes complete, stable and tested.
2. Several Function classes completely implemented, stable and tested.
3. IMPORTANT: Very minor change in FormulaParser in HEAD (+ 4lines).
FormulaParser failed for UnaryPlus operations and UnaryPlusPtg was not being
created during parse(). This has been fixed looking at how UnaryMinus was being
handled. Please verify the fix is correct!!!
4. updated xdocs included in patch.
5. Automated test class included but needs a source file change to configure
for particular environment. This mechanism needs to be changed once the patch
is certified stable :) The test xls is added separately, not included in patch.
Comment 15 Amol Deshmukh 2005-05-16 09:12:15 UTC
Created attachment 15040 [details]
Test data for automated tests

This is the workbook containing the sheet for automated tests. Adding tests for

implemented function is easy - see dev guide for additional info.
Comment 16 xudong 2005-05-17 05:26:57 UTC
Thank you, Amol. It works!!

BTW, I found there are some minor errors in XDocs (eval.xml):
*line 56
 "evaluator.evaluate(formulaString);"
 should be
 evaluator.evaluate(cell);

*line 60
 "System.out.println(cellValue.getBooleanCellValue());"
 should be
 System.out.println(cellValue.getBooleanValue());

*line 63
 "System.out.println(cellValue.getNumberCellValue());"
 should be
 System.out.println(cellValue.getNumberValue());

*line 66
 "System.out.println(cellValue.getStringCellValue());"
 should be
 System.out.println(cellValue.getStringValue());

Hi, Amoy. Thanks again for your excellent job.
Comment 17 Amol Deshmukh 2005-05-17 14:46:49 UTC
Created attachment 15056 [details]
Patch file (not zipped)

Sorry, I was not aware that patch files could not be zipped and uploaded. The
previous file seemed to have been uploaded as "text/plain" (since I added it as
a patch?). This patch file is not zipped, so it should work.
Comment 18 Avik Sengupta 2005-05-17 19:37:42 UTC
OK, patch help please. Why does the following does not work? My working dir is
the top level poi dir, ie, the dir containing src/

bash-3.00$ pwd
/home/aviks/projects/jakarta-poi-HEAD
bash-3.00$ patch -p1 <FormulaEvaluator_v0.5.patch
(Stripping trailing CRs from patch.)
can't find file to patch at input line 8
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: src/documentation/content/xdocs/hssf/eval-devguide.xml
|===================================================================
|RCS file:
/home/cvspublic/jakarta-poi/src/documentation/content/xdocs/hssf/eval-devguide.xml,v
|retrieving revision 1.1
|diff -u -r1.1 eval-devguide.xml
|--- src/documentation/content/xdocs/hssf/eval-devguide.xml     13 May 2005
14:52:42 -0000      1.1
|+++ src/documentation/content/xdocs/hssf/eval-devguide.xml     17 May 2005
12:43:06 -0000
--------------------------
File to patch:                    
Comment 19 Amol Deshmukh 2005-05-17 23:53:08 UTC
Created attachment 15063 [details]
Patch that works from command line CVS

This is getting too long: This latest patch works on command line - confirmed.
However it does not include the proposed fix to FormulaParser and the updated
xdocs (which seemed to be causing the problem). Will try submitting changes to
xdocs+FormulaParser as a separate patch.
Comment 20 Avik Sengupta 2005-05-18 07:11:21 UTC
The formula parser patch I can apply manually, since I know the source well. For
the xdocs, just attach the changed files in their entirety for now. 
Comment 21 Amol Deshmukh 2005-05-18 18:32:07 UTC
Created attachment 15073 [details]
Zipped FormulaEvaluator xdocs (eval and eval-devguide)

These are updated & zipped xdocs. Sending them into the patch failed earlier.
Comment 22 Avik Sengupta 2005-05-19 14:16:51 UTC
Comitted. Thanks once again, Amol!