The problem can be reproduced by using the following query which at the moment can be found in subquery_in.q file:
The plans before and after HiveSubQueryRemoveRule are shown below:
The plan after applying the rule is invalid. The HiveFilter(condition=[=($1, $12)]) above the correlate references columns ($12) from the right input which do not exist since the correlate is of type SEMI. Running the test with -Dcalcite.debug property enabled raises an AssertionError when building the HiveFilter.
The problem is hidden at the moment since there is a specific hack in HiveRelDecorrelator that turns this invalid plan into a valid one. This mechanism is very brittle and it can break easily as it happened while fixing