Details
-
Sub-task
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
conjunction_clause: function_call(function_parameter, ...) | field_access | column
function_parameter: function_call | field_access
In Beam, equi-join is implemented by CoGBK, which requires both join inputs (assume binary join) to build PCollection of KV<Row, Row>, where the key is join key.
For equi-join, conjunction clause is essentially an equation. In order to build KV<Row, Row>, it requires that columns from different sides of equation should come from different join input. For example, a + b = 2 cannot be used to build join key but a = 2 - b can. So rewriting is required for clauses when it does not satisfy this property.
It also implies that not every clause is rewritable. Say the clause is f(a, b) = 3, in which a is from left input and b is from right input. If this function f is not splittable, such that we cannot move a or b to right side of equation, then we cannot support this clause in BeamSQL's join.