Table `tblA` schema:
Table `tblB` schema:
For the following query:
TO_HEX is used as an example, but it can be any other expression; for instance `cast(query.id as Integer)`.
Out of bound exception is thrown in SqlToRelConverter.
Before joins are created left and right paths are parsed first. For the 1st query above they are as follows:
As they are processed - they are registered as leaves (added to the Array).
When Join node is being created it knows what the `condition expressions` is:
Since TO_HEX is not computed anywhere - it modifies the left input to be as follows (via RelOptUtil#pushDownJoinConditions) because RelBuilder#shouldMergeProject always return true.
where `VARCHAR $f3` is a result of TO_HEX. Note that the list of leaves is not updated.
Finally, when identifier "query.fA1_2" is being converted (via SqlToRelConverter#convertIdentifier) for the top-most node
Blackboard perform a lookup (via SqlToRelConverter#lookupExp), in process of which LookupContext is created.
In a constructor, LookupContext performs flatten, which recursively traverses tree of nodes (from above codeblock) and checks the leaves to see if they contain such expression. When it does get to the modified left input of a join it does not get a match on it and continues further down to a TableScan.
When it finally flattens the result, TableScan's RecordType knows nothing about a duplicated field `fA1_2`, causing an error above.
I think a viable solution would be to modify Join creation to register a resulting join inputs as leaves (when they are modified). Alternative approach would be to not merge Projects when join needs to modify an input.