A surprisingly simple repro of this NPE can be created as follows:
create table t1 (i int, vc varchar(10));
insert into t1 values (1, 'one'), (2, 'two'), (3, 'three'), (1, 'un');
select * from t1, (select * from t1) x
where t1.i = x.i and x.vc in ('un', 'trois');
The key here is that the IN list's left operand points to a column from the subselect, X.VC. As part of
DERBY-47, the IN list will be changed into a BinaryRelationalOperatorNode of the form "X.VC = ?", and that node, which we call a "probe predicate", will serve to represent the IN list operator throughout the various phases of optimization.
That said, as part of preprocessing Derby will look at the query and realize that the sub-select can be flattened. When flattening the subquery, any references to the subquery's RCL will be remapped to point to the underlying expression. That means the left operand of the probe predicate "X.VC = ?" will be changed to point directly to column "VC" of table T1. The code where this happens is in the "flatten" method of FromSubquery:
/* Remap all ColumnReferences from the outer query to this node.
- (We replace those ColumnReferences with clones of the matching
- expression in the SELECT's RCL.
For the example query above, outerPList holds the two predicates "T1.I = X.I" and "X.VC = ?", so we will attempt to remap the column references in those two predicates. That brings us to the remapColumnReferencesToExpressions() method of BinaryOperatorNode, where we have:
public ValueNode remapColumnReferencesToExpressions()
leftOperand = leftOperand.remapColumnReferencesToExpressions();
rightOperand = rightOperand.remapColumnReferencesToExpressions();
Notice how the leftOperand can change here--and in the above query, it will change to point directly to T1 instead of indirectly to the subquery. So now the probe predicate's left operand is different from the left operand of the original InListOperatorNode that the probe predicate replaced. That in it itself is fine, but it causes problems later.
Namely, when it comes time to generate the final tree for the query, we realize that the probe predicate is not "useful" for probing because it references "VC", which is the second column in table T1. Since probe predicates are only useful if they reference the first column in the table, per "orderUsefulPredicates(...)" of PredicateList.java:
else if (pred.isInListProbePredicate()
&& (indexPosition > 0))
/* If the predicate is an IN-list probe predicate
- then we only consider it to be useful if the
- referenced column is the first one in the
- index (i.e. if (indexPosition == 0)). Otherwise
- the predicate would be treated as a qualifier
- for store, which could lead to incorrect
the probe predicate is not useful. That in turn means that when it comes time to generate the IN list operator, we'll "revert" back to the original InListOperatorNode--i.e. we will generate the InListOperatorNode instead of generating the probe predicate. This is found in the generateExpression() method of BinaryOperatorNode:
if (ilon != null)
But there's a problem here: as mentioned above, ilon (the InListOperatorNode) still has a left operand that points to a column from the subquery. Since we flattened the subquery out, that left operand is no longer valid--and that ultimately causes an execution time NPE because we try to apply the IN list restriction to a column from a subquery that does not exist.
I tried a one-line fix to this code that seems to have resolved the issue:
if (ilon != null)
ilon.setLeftOperand(this.leftOperand); // Added this line
(with appropriate code comments, of course).
This has the effect of making sure that when we "revert" back to the original InListOperatorNode generation, we'll still generate the correct leftOperand--i.e. the left operand as it exists in the "probe predicate" upon completion of optimization.
I'm attaching this small fix as d3253_v1.patch. I have yet to run the regression tests (they are running now), but I thought I'd post my findings for early review in the interim...