Affects Version/s: None
Fix Version/s: 1.6.0
getMaxRowCount(RelSubset) defaults to getMaxRowCount(RelNode) and returns null, so if the Join rel has its outer child rel as a RelSubset, the SortJoinTransposeRule will fail the checkInputForCollationAndLimit check and will not proceed to push down a limit through join.
Before fix of
CALCITE-995 and CALCITE-987, getMaxRowCount(RelSubset) would return positive infinity, which would cause similar situation but an opposite effect as firing the rule infinitely when the join's outer child is a RelSubset.
Neither of the above situation was reflected in the test cases for SortJoinTransposeRule in Calcite, since the RelSubset condition was not covered by RelOptRule test which uses HepPlanner running with just one or no more than a couple of rules together.
Basically we need a more accurate way to get max row count for RelSubset (similarly for HexVertex as well), otherwise checkInputForCollationAndLimit would either always fail or always succeed for a Limit over a Join over a RelSubset. But I assume, to be real accurate, we'd have to introduce a similar mechanism to one that computes bestCost in RelSubset, which I doubt would be worth it.
Another way, which might seem a little ugly, is to add something like "isSortPushedThrough()" in Sort rel, similar to "isSemiJoinDone()" in Join, in order to avoid the rule being fired infinitely.
In my Phoenix project, I applied a temporary fix, but it proved to work: