This is the preview of the HSSFFormulaEvaluator. The design is modular, so anybody should be able to contribute the Excel function library implementation. Currently about 20-30 functions are implemented (admittedly buggy). Comments invited. README.txt follows: *** WARNING: This is a work in progress :) *** *********************** TESTING ********************* Quick Test: No JUnit tests are provided as of now. For a quick test (and guaranteed results) use eclipse! After applying patch to HEAD, go to xxxxxx.hssf.usermodel.HSSFFormulaEvaluator This class has a main method so it can be run from eclipse. Be sure you adjust the file path that is hard coded in the main method to point to the included test excel. The HSSFFormulaEvaluator interface is not complete so as of now there is only one method that you can use in case you want to do custom testing: evaluateToString(formula, sheet, wb) ****************** BLAH ******************* Basic description: 1. Take the cell formula string, parse it into RPN tokens using FormulaParser. 2. For each RPN token: If it is operation token, pop the numberOfOperands reqd by this operation Create approproate OperationEval instance and perform the operation by calling "evaluate" If it is an AreaPtg token, evaluate every cell in the area, create 1D Array of values, put values there For area tokens, use AreaEval to store the reference to AreaPtg and the array of values. Push AreaEval. If it is ReferencePtg token, evaluate it, put the ReferencePtg and the value in RefEval. Push RefEval. Else, it is one of IntPtg, NumberPtg, StringPtg, BoolPtg -> these are pushed on Push a ValueEval 3. Eventually there will be just one token on stack (if all goes well). This can either be NumericEval, StringEval or RefEval. for RefEval, one more step is required to extract the inner ValueEval 4. Done. ****************** BLAH BLAH ******************* File/Class organization: 0. Formulatokens are mapped to Eval impls 1. Base interface for all Evals is.... "Eval" !!! 2. This is extended as follows: ValueEval extends Eval: For value elements (ie. not operations) OperationEval extends Eval: For operation elements NumericValueEval extends ValueEval: For value elements that can be directly evaluated as numbers StringValueEval extends ValueEval: For value elements that can be directly evaluated as strings ErrorEval extends ValueEval: Error values AddEval, SubtractEval etc. implement OperationEval: Individual operations FunctionEval implements OperationEval: For special operations that are functions FuncVarEval extends FunctionEval: Operations that are functions. Note: this is because I dont know what FuncPtg does. In case I need it, I will extend FuncPtg from FunctionEval. Hence I have FunctionEval *** functions *** functions are in package: xxxxx.hssf.record.formula.functions Base interface is Function DefaultFunctionImpl extends Function: For functions that are not yet implemented. default behavious is to return ErrorEval (resutls in the entire value being an error of type FUNCTION_NOT_IMPLEMENTED. Individual function classes extend from DefaultFunctionImpl.
Created attachment 14977 [details] PATCH on HEAD for FormulaEvaluator functionality
Created attachment 14979 [details] [PATCH] Overrides previous patch, default impl of all function classes This patch has all of previous patch and: 1. Impl of all core classes in xxxxx.function.eval is now complete 2. Default impl for *all* function classes is added (defalt impl is to return an error value)
This is good! I like it:) Thanks Amol. A couple of q's 1. I realise you're going thru hoops to decouple this, thanks.. i'm sure this will turn out to be a good decision in the long run. However, I was wondering if making the Evals extend Ptgs may not be a better option than delegating. Will it save some typing, and get rid of the ugly MAP? Alternately, if delegating, does it makes sense to implement the delegate attribute and the common methods (such as getNumberofOperands) in an abstract base class? I must admit I havent thought these thru completely. 2. Should the function implementations be in one big class as individual methods, or as separate classes (which in any case have only one method)? What is easier to code/maintain? 3. This will need enormous testing before we expect this to be used in production. We will essentially need to test complete excel math. So people, if you are interested in this feature in POI, writing tests for this is the simplest way to contribute. Also, we need unit tests first, rather than functional tests.. ie, test the POI functionality in isolation, without reading and writing excel files. Functional tests will necessarily be less in number. 4. Amol, could you teach eclipse not to add default comments, and remove the ones that are currently there. It just adds noise. the "non-javadoc @see ..." stuff... Once again, amol, great piece of work.
**** **** **** **** **** **** **** **** **** **** **** Note: I goofed up a bit on the second **** **** version of the patch #14979 **** **** As a result, you may have to apply the **** **** patch to /src/scratchpad/src instead of **** **** the root folder **** **** **** **** **** **** **** **** **** **** **** Avik, Thanks for the encouraging response 1. Ptg and Eval: Delegating or extending? Initially I had an impl that added hooks into *Ptgs to get/store values. But with input from you and Andy I realised that integrating tightly with Ptgs would mean that the code could not go in until it was completely tested (read a long long time :) IMHO, having Evals separate from Ptg is not too bad since there are only so many *Evals and all the important Evals have been covered. Delegation has the advantage (in this case atleast) of decoupling FormulaParser from FormulaEvaluator. But I guess extending from Ptg could also work... 2. Functions in one class or one class per function? Honestly, I did not give this much thought when I designed - I just assumed one class per function was superior. Thinking about it now, it seems like multiple functions in one class is an alternative - though wrt maintainability i think it would be easier to test-develop-maintain individual classes than one big class. Some of the function implementations themselves can be quite big themselves, so one huge class would be /really/ huge. Also one function per class should make it easier to organize unit tests. Unless one class per function causes problems, I'd vote for one class per function. 3. Testing... Unit testing would be a big effort. Almost as big as writing individual functions, a distributed effort would be great! 4. My eclipse... Yeah, sorry about that. I have two different versions of eclipse, I still havent taught one of them not to do the default comments :) 5. Under contruction: The work on core eval classes is not yet "complete". I think there is need for a BlankEval class to handle empty cells - which are currently being handled as StringEval("").
Created attachment 14992 [details] PATCH file zipped, several bugs in core fixed BlankEval introduced, Removed eclipse generated comments, Core classes bug fixes done, RelationalOperators implemented as per excel (not Oo).
Created attachment 15012 [details] HSSF User's Guide to the FormulaEvaluator user api guide. comments invited. current patch #14992 does NOT reflect this user api - next patch will.
Created attachment 15013 [details] HSSF User's Guide to the FormulaEvaluator sorry, wrong file uploaded earlier :(
Created attachment 15014 [details] HSSF Contributors Guide to the FormulaEvaluator dev guide. Overview of design and info on how to contribute function implementation. May change. Comments welcome.
Created attachment 15015 [details] Sheet for test data Sheet for entering test data. Test code has already been written. See dev guide for more info on how to use the sheet. Comments invited.
This is great, i'll convert it into xdocs tom. In the meanwhile, you can add the tests in batches if you want. Another thought, on the tests... Why do you need separate test methods/classes for each eval if you are directly reading it from the excel sheet... your public class TestAddEval/public void testAdd() does not contain any code specific to the AddEval. All you need is some logic/convention to read multiple rows from the sheet.. probably an odd/even row num with blank checks should do. That way, one method (with asserts in a loop) will test all functions/operators.
Documents converted to xdocs (phew!) and comitted.
Thanks for making such a patch. I don't know why, but I cannot find any constructor in HSFFormulaEvaluator. I am using the patch created in 2005-05-11 06:02. What's wrong?
Thanks for making such a patch. I don't know why, but I cannot find any constructor in HSFFormulaEvaluator. I am using the patch created in 2005-05-11 06:02. I think I am using a wrong patch. But where can I find the correct one?
Created attachment 15039 [details] Stable patch with updated xdocs included in patch This FormulaEvaluator patch contains: 1. All core classes complete, stable and tested. 2. Several Function classes completely implemented, stable and tested. 3. IMPORTANT: Very minor change in FormulaParser in HEAD (+ 4lines). FormulaParser failed for UnaryPlus operations and UnaryPlusPtg was not being created during parse(). This has been fixed looking at how UnaryMinus was being handled. Please verify the fix is correct!!! 4. updated xdocs included in patch. 5. Automated test class included but needs a source file change to configure for particular environment. This mechanism needs to be changed once the patch is certified stable :) The test xls is added separately, not included in patch.
Created attachment 15040 [details] Test data for automated tests This is the workbook containing the sheet for automated tests. Adding tests for implemented function is easy - see dev guide for additional info.
Thank you, Amol. It works!! BTW, I found there are some minor errors in XDocs (eval.xml): *line 56 "evaluator.evaluate(formulaString);" should be evaluator.evaluate(cell); *line 60 "System.out.println(cellValue.getBooleanCellValue());" should be System.out.println(cellValue.getBooleanValue()); *line 63 "System.out.println(cellValue.getNumberCellValue());" should be System.out.println(cellValue.getNumberValue()); *line 66 "System.out.println(cellValue.getStringCellValue());" should be System.out.println(cellValue.getStringValue()); Hi, Amoy. Thanks again for your excellent job.
Created attachment 15056 [details] Patch file (not zipped) Sorry, I was not aware that patch files could not be zipped and uploaded. The previous file seemed to have been uploaded as "text/plain" (since I added it as a patch?). This patch file is not zipped, so it should work.
OK, patch help please. Why does the following does not work? My working dir is the top level poi dir, ie, the dir containing src/ bash-3.00$ pwd /home/aviks/projects/jakarta-poi-HEAD bash-3.00$ patch -p1 <FormulaEvaluator_v0.5.patch (Stripping trailing CRs from patch.) can't find file to patch at input line 8 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- |Index: src/documentation/content/xdocs/hssf/eval-devguide.xml |=================================================================== |RCS file: /home/cvspublic/jakarta-poi/src/documentation/content/xdocs/hssf/eval-devguide.xml,v |retrieving revision 1.1 |diff -u -r1.1 eval-devguide.xml |--- src/documentation/content/xdocs/hssf/eval-devguide.xml 13 May 2005 14:52:42 -0000 1.1 |+++ src/documentation/content/xdocs/hssf/eval-devguide.xml 17 May 2005 12:43:06 -0000 -------------------------- File to patch:
Created attachment 15063 [details] Patch that works from command line CVS This is getting too long: This latest patch works on command line - confirmed. However it does not include the proposed fix to FormulaParser and the updated xdocs (which seemed to be causing the problem). Will try submitting changes to xdocs+FormulaParser as a separate patch.
The formula parser patch I can apply manually, since I know the source well. For the xdocs, just attach the changed files in their entirety for now.
Created attachment 15073 [details] Zipped FormulaEvaluator xdocs (eval and eval-devguide) These are updated & zipped xdocs. Sending them into the patch failed earlier.
Comitted. Thanks once again, Amol!