Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
1.20.0
Description
Abstract: Both RexImpTable and BlockBuilder have codegen issues, but it is interesting that they can work together well for most cases.
We can illustrate the problem with a simple test case in JdbcTest, in which the "commission" column is nullable.
@Test public void testNullSafeCheck() { CalciteAssert.hr() .query("select \"commission\" + 10 as s from \"hr\".\"emps\"") .returns("S=1010\n" + "S=510\n" + "S=null\n" + "S=260\n"); }
This test case can pass as the BlockBuilder is in default optimization mode. However, when we set it into un-optimization mode in EnumerableCalc, this test will fail due to NPE. The following picture demonstrates their differences.
1.RexImpTable generates unsafe code
Before translating the RexCall (Add), RexImpTable firstly translate its operands with (nullAs=IS_NULL) [1] to detect whether it is null (i.e., inp4_unboxed). Then RexImpTable sets this operand's nullable in RexToLixTranslator as false [2]. After that, the operand will be translated again with NOT_POSSIBLE [3] to get the value (i.e., inp4_0_unboxed). In the end, the RexCall is implemented by NotNullImplementor.However, it is not safe to conduct operations like unboxing in the second translation phase.
2.BlockBuiler optimization's semantic issue buries NPE
BlockBuilder.optimize() changes the code semantic in this case. For conditional-like clauses (if...else, ?:, etc), InlineVariableVisitor will wrongly make variables inlined.
In general, they can work together for most cases. However, when some special branch is triggered by query, the problem will be exposed. For example, the NewExpression (new java.math.BigDecimal) in CALCITE-3143 breaks the inline optimization phase.
How to fix?
I have digged into this problem a couple of days and tried many approaches to fix it. But in this progress, I found the limitation in current implementation. The whole recursive framework essentially conducts a sequential codegen beheavior, and may visit a RexNode again and again with different NullAs settings.
Due to the limitation, it is difficult to implement null-safe codegen semantics with branching logic. We can also find that there are many branches for special cases in current implementation. Even we can handle potential issues every time, the logic will becomes more and more complex and unfriendly for maintenance.
Therefore, I propose to re-consider this part, including several initial points.
(1) Visitor Pattern (RexVisitor<Result>). Theoretically, RexNode can be translated into Expression by visiting the node only once. We can implement RexVisitor rather than current recursive translation.
(2)The Result consists of three items (code: BuilderStatement, isNull: ParameterExpression, value: Expression).So it is easy to decide how to implement a RexNode according to its children.
Please correct me if I make something wrong. Look forward to suggestions!
Attachments
Attachments
Issue Links
- duplicates
-
CALCITE-3224 New RexNode-to-Expression CodeGen Implementation
- Closed
- is related to
-
CALCITE-3143 Dividing by NULLIF may cause "ArithmeticException: Division by zero"
- Closed
-
CALCITE-3224 New RexNode-to-Expression CodeGen Implementation
- Closed
-
CALCITE-3142 NullPointerException when rounding a nullable numeric
- Closed
-
CALCITE-3150 NullPointerException when UPPER is repeated and combined with LIKE
- Closed
- links to