Details

Type: New Feature

Status: Closed

Priority: Major

Resolution: Fixed

Affects Version/s: None

Fix Version/s: 6.5

Component/s: None

Security Level: Public (Default Security Level. Issues are Public)

Labels:None
Description
One of the things that will be needed as the SQL implementation matures is the ability to do arithmetic operations. For example:
select (a+b) from x;
select sum(a)+sum(b) from x;
We will need to support arithmetic operations within the Streaming API to support these types of operations.
It looks like adding arithmetic operations to the SelectStream is the best place to add this functionality.

 SOLR9916.patch
 241 kB
 Dennis Gove

 SOLR9916.patch
 241 kB
 Dennis Gove

 SOLR9916.patch
 176 kB
 Dennis Gove

 SOLR9916precommit.patch
 1 kB
 Dennis Gove
Activity
 All
 Comments
 Work Log
 History
 Activity
 Transitions
One possible approach would look like this:
plus(a, b, outField) plus(plus(a,b), c, outField) plus(sum(a), sum(b), outField)
In the first example two field names are used to represent operands.
In the second example the first operand is a nested arithmetic operation.
In the third example the operands are aggregate function names.
The constructors of the arithmetic operations will need to do the work to distinguish the different types of operands.
As part of a select expression it would like this:
select(expr, plus(a,b, outField), minus(sum(a), sum(b), outField))
For simplicity arithmetic functions can only return doubles.
Suggested initial arithmetic operations:
plus
minus,
mult,
div,
abs,
mod
Dennis Gove, I'm curious about your thoughts on this ticket. Do you think this is the right approach?
I think this is a good idea.
Select already supports an "as" concept, so something like would be possible already
select(plus(a,b) as outfield, <incoming stream>)
I'm going to start implementing these as Operations. I'll be sure to support the cases of operations within operations like
plus(div(a,replace(b,null,0)),c)
Looking at the current state of Operations, the following class structure exists
StreamOperation ConcatOperation BooleanOperation AndOperation LeafOperation EqualsOperation GreaterThanEqualToOperation GreaterThanOperation LessThanEqualToOperation LessThanOperation NotOperation OrOperation ReduceOperation DistinctOperation GroupOperation ReplaceOperation (and associated hidden ReplaceWithFieldOperation, ReplaceWithValueOperation)
I'd like to enhance this slightly to the following
StreamOperation BooleanOperation AndOperation LeafOperation EqualsOperation GreaterThanEqualToOperation GreaterThanOperation LessThanEqualToOperation LessThanOperation NotOperation OrOperation ComparisonOperation IfOperation ModificationOperation AbsoluteValueOperation AdditionOperation ConcatOperation DivisionOperation ModuloOperation MultiplicationOperation ReplaceOperation (and associated hidden ReplaceWithFieldOperation, ReplaceWithValueOperation) SubtractionOperation ReduceOperation DistinctOperation GroupOperation
This will allow us to support arbitrarily complex operations in the Select stream. It accomplishes this in 3 ways.
Comparison Operation
First, add an if/then/else concept with the ComparisonOperation. Embedded operations will be supported, either Modification or Comparison.
The full supported structure is
if(boolean, field  modification  comparison, field  modification  comparison)
For example,
if(boolean(...), fieldA, fieldB) ex: if(gt(a,b), a, b) // if a > b then a else b if(boolean(...), modification(...), modification) ex: if(gt(a,b), sub(a,b), sub(b,a)) // if a > b then a  b else b  a if(boolean(...), comparison(...), comparison(...)) ex: if(gt(a,b), if(or(c,d), a, b), if(and(c,d), a, b)) // if a > b then (if c or d then a else b) else (if c and d then a else b)
ModificationOperations with Embedded Operations
Second, enhance ModificationOperations to support embedded operations, either Modification or Comparison.
The full supported structure is
modification(field  modification  comparison [, field  modification  comparison])
For example,
modification(fieldA [,fieldB]) ex: add(a,b) // a + b modification(fieldA [,modification(...)]) // order doesn't matter ex: add(a, div(b,c)) // a + (b/c) add(div(b,c), a) // (b/c) + a modification(fieldA [,comparison(...)]) // order doesn't matter ex: add(a, if(gt(b,c),b,c)) // if b > c then a + b else a + c add(if(gt(b,c),b,c), a) // if b > c then a + b else a + c
BooleanOperations with Embedded Operations
Third, enhance BooleanOperations to support embedded operations, either Modification or Comparison. Each would support the following constructs
The full supported structure is
boolean(field  modification  comparison [, field  modification  comparison])
boolean(fieldA [,fieldB]) ex: gt(a,b) boolean(fieldA [,modification(...)]) // order doesn't matter ex: gt(a, add(b,c)) // is a > (b + c) gt(add(b,c), a) // is (b + c) > a boolean(fieldA [,comparison(...)]) // order doesn't matter ex: gt(a, if(gt(b,c),b,c)) // if b > c then is a > b else is a > c gt(if(gt(b,c),b,c), a) // if b > c then is b > a else is c > a
Joel Bernstein, I'm interested in your thoughts on this.
Looks really good to me. Having the ability to nest the different types of operations with conditional logic in the select stream is really powerful.
I'm just about to commit a small change so that LeafOperations can accept a metric identifier without single quotes. Currently you have to do the following or the parser will parse the metric and not know how to use it as a value operand.
having(expr, eq('sum(a_i)', 9))
After this small commit it will support:
having(expr, eq(sum(a_i), 9))
This will just be relevant for Solr 6.4 which is coming in a few days.
The work you're doing on this ticket will supersede this change but it's nice to have for 6.4.
Sounds good.
What is
sum(a_i)
? Is that calculating the sum over a multivalued field? (if so...didn't know we were supporting multivalued fields, really cool)
Here is the full expression:
having(rollup(over=a_f, sum(a_i), search(collection1 q=*:*, fl="id,a_s,a_i,a_f", sort="a_f asc")), eq(sum(a_i), 9)))
So the "sum(a_i)" is the field in the tuples produced by the rollup.
Of course. sum is a metric.
I've decided to go in a slightly different direction with this. After refamilarizing myself with StreamOperations, it became clear that operations are meant to modify tuples. For example, the ReplaceOperation replaces some field value with some other value via the
void operate(Tuple)
function. Newer operations like the BooleanOperation extends that with an evaluate() function, but I find it's not quite as clear as it could be.
Bringing this back to the problem we want to solve: we want to evaluate some value based on a tuple. This isn't meant to modify a tuple but instead to calculate new values from other values that exist within tuple. This is true whether we are adding, multiplying, determining equality, greater than, or choosing with an if/else. We are evaluating, but not modifying, the tuple.
To solve this problem, I'm introducing a new set of classes called StreamEvaluators. StreamEvaluators follow the same functional expression structure as everything else within the streaming sphere and define the function
public Object evaluate(Tuple)
. The object returned from this function is the result of the evaluation against the tuple. For example, the result returned for the expresssion
add(a,b)
is the result of field a added to field b. The datatype of the returned result is determined by the evaluator and the source field types. For example,
add(a,b)
could reasonably return a Number, either Long or Double, while
eq(a,b)
could reasonably return a Boolean, while
if(eq(a,b),c,d)
could reasonably return any type.
StreamEvaluators come in two basic flavors  those that can contain other evaluators and those that can't.
add(a,b) // field a + field b sub(add(a,b),c) // (a + b)  c mult(if(gt("a",b),a,b),c) // if field a > field b then a * c else b * c if(eq(a,b),val(34),c) // if a == b then value 34 else c if(eq(a,b),val(foo),c) // if a == b then value "foo" else c if(eq(a,null),b,c) // if a is null then b else c
There are a couple pieces of note in here.
 null is a special case and will be treated as a standard null value
 A ValueEvaluator
val(<string>), val(<number>), val(<boolean>)
will evaluate to the raw value contained within
 This allows us to easily distinguish between field names and raw string values.
 Within any other evaluator, a string, quoted or not, will be considered a field name
As a result of this, the class structure is turning into this.
StreamEvaluator ComplexEvaluator // evaluators allowing other StreamEvaluators as parameters (looking for better class name) NumberEvaluator // evaluators resulting in a Number return value AbsoluteValueEvaluator // abs(a) AddEvaluator // add(a,b,...,z) DivideEvaluator // div(a,b) MultiplyEvaluator // mult(a,b,...,z) SubtractEvaluator // sub(a,b) BooleanEvaluator // evaluators resulting in a Boolean return value AndEvaluator // and(a,b,...,z) == true iff all all true EqualsEvaluator // eq(a,b,...,z) == true iff all are equal GreaterThanEqualToEvaluator GreaterThanEvaluator LessThanEqualToEvaluator LessThanEvaluator OrEvaluator ConditionalEvaluator // evaluators performing a conditional and returning an Object based on the result IfThenElseEvaluator SimpleEvaluator // evaluators not allowing other StreamEvaluators as parameters (looking for a better class name) ValueEvaluator // return the raw value asis FieldEvaluator // return the value of the field  not something that needs to be expressed in the expression
StreamEvaluators will become a type of parameter supported by the SelectStream and executed after the execution of operations in that select clause. The result of the evaluation will be put into the tuple under the 'as' field name.
select(...,add(a,b) as aPlusb)
Includes the following evaluators
StreamEvaluator ComplexEvaluator // evaluators allowing other StreamEvaluators as parameters (looking for better class name) NumberEvaluator // evaluators resulting in a Number return value AbsoluteValueEvaluator // abs(a) AddEvaluator // add(a,b,...,z) DivideEvaluator // div(a,b) MultiplyEvaluator // mult(a,b,...,z) SubtractEvaluator // sub(a,b) BooleanEvaluator // evaluators resulting in a Boolean return value AndEvaluator // and(a,b,...,z) == true iff all all true EqualsEvaluator // eq(a,b,...,z) == true iff all are equal GreaterThanEqualToEvaluator GreaterThanEvaluator LessThanEqualToEvaluator LessThanEvaluator OrEvaluator ConditionalEvaluator // evaluators performing a conditional and returning an Object based on the result IfThenElseEvaluator SimpleEvaluator // evaluators not allowing other StreamEvaluators as parameters (looking for a better class name) FieldEvaluator // return the value of the field  not something that needs to be expressed in the expression
Still needed:
 ValueEvaluator
 Addition Testing
 Handling of null raw value
 Additional evaluators
Future work for another ticket will be to remove the use of BooleanOperation over to use evaluators.
This is complete. All tests pass.
I have deleted and replaced all existing BooleanOperations with their requisite BooleanEvaluators and added additional evaluators.
The registration of default evaluators looks like this
// Stream Evaluators .withFunctionName("val", RawValueEvaluator.class) // Boolean Stream Evaluators .withFunctionName("and", AndEvaluator.class) .withFunctionName("eor", ExclusiveOrEvaluator.class) .withFunctionName("eq", EqualsEvaluator.class) .withFunctionName("gt", GreaterThanEvaluator.class) .withFunctionName("gteq", GreaterThanEqualToEvaluator.class) .withFunctionName("lt", LessThanEvaluator.class) .withFunctionName("lteq", LessThanEqualToEvaluator.class) .withFunctionName("not", NotEvaluator.class) .withFunctionName("or", OrEvaluator.class) // Number Stream Evaluators .withFunctionName("abs", AbsoluteValueEvaluator.class) .withFunctionName("add", AddEvaluator.class) .withFunctionName("div", DivideEvaluator.class) .withFunctionName("mult", MultiplyEvaluator.class) .withFunctionName("sub", SubtractEvaluator.class) // Conditional Stream Evaluators .withFunctionName("if", IfThenElseEvaluator.class)
All evaluators accept the following parameter formats
add(abc,def) // field abc + field def add(sub(abc,def),ghi) // (field abc  field def) + field ghi add(abc,9) // field abc + 9 add(sum(abc), def) // field sum(abc) + field def
Basically, when an evaluator is parsing its parameters it will first determine if the parameter is another evaluator. If not, then it will determine if the parameter is a Double, Long, or Boolean raw value. If not, it will treat the parameter as a field name. This allows us to use field names like "sum(abc)" which are the result of rollups and use raw values and use embedded evaluators.
Joel Bernstein, just wondering on your thoughts on this? I completely removed the Operations that were added last summer and replaced them with these Evaluators.
I haven't had a chance to review the patch but what you're describing sounds good to me. I've put the HavingStream to use in SOLR8593 so I'll likely have to make some adjustments when that work gets merged into master. If you're feeling comfortable feel free to move forward and commit, these are big improvements.
Final patch. All tests pass. Applies to both branch_6x and master.
Commit 7372df9957b75c08283af6db47234df1787f1490 in lucenesolr's branch refs/heads/branch_6x from Dennis Gove
[ https://gitwipus.apache.org/repos/asf?p=lucenesolr.git;h=7372df9 ]
SOLR9916: Adds Stream Evaluators to support evaluating values from tuples
Commit 62489678d074edb2ee962e1c4ee38026ff504b2a in lucenesolr's branch refs/heads/master from Dennis Gove
[ https://gitwipus.apache.org/repos/asf?p=lucenesolr.git;h=6248967 ]
SOLR9916: Adds Stream Evaluators to support evaluating values from tuples
Mistake closing this. I intend to keep it open until I update the wiki docs.
I've added descriptions of each evaluator to Trunk Changes at https://cwiki.apache.org/confluence/display/solr/Internal++Trunk+Changes+to+Document. I'll move these to the true location after 6.4 has been cut.
I think this change broke ant precommit?
Build Log: [...truncated 75472 lines...] documentationlint: [jtidy] Checking for broken html (such as invalid tags)... [delete] Deleting directory C:\Users\jenkins\workspace\LuceneSolr6.xWindows\lucene\build\jtidy_tmp [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for malformed docs... [exec] [exec] C:\Users\jenkins\workspace\LuceneSolr6.xWindows\solr\build\docs\solrsolrj/overviewsummary.html [exec] missing: org.apache.solr.client.solrj.io.eval [exec] [exec] Missing javadocs were found!
Dennis Gove could you please have a look? Thanks.
Looks like I forgot to add the packageinfo.java file to the new package. Will add.
Commit 1700e860cebfc93a0f3ffc3cafcf77b674c6f79c in lucenesolr's branch refs/heads/master from Dennis Gove
[ https://gitwipus.apache.org/repos/asf?p=lucenesolr.git;h=1700e86 ]
SOLR9916: Adds packageinfo.java to org.apache.solr.client.solrj.io.eval so it passes precommit
Commit 5c9dace3ca569f6e1be4f1bc39d75c52bd049e6b in lucenesolr's branch refs/heads/branch_6x from Dennis Gove
[ https://gitwipus.apache.org/repos/asf?p=lucenesolr.git;h=5c9dace ]
SOLR9916: Adds packageinfo.java to org.apache.solr.client.solrj.io.eval so it passes precommit
This is the patch that fixes the
ant precommit
failure.
With these operations, we should be able to support literal comparison literal? That should help on the SQL pushdown piece.
I think the implementation will not be difficult. The tricky part is going to be getting the syntax sorted out. I'll use the comments below to suggest some syntax options.