Currently SqlToRelConverter rewrites queries with IN predicate into semi-join but without actually using semi-join. The resulting plan includes two joins and several aggregates over IN argument list to calculate some sort of indicators. This plan is quite cumbersome, it contains a lot of nodes, thus boost a search space.
As workaround this optimization was disabled by increasing inSubQueryThreshold to a MAX_INT value.
But a safer solution would be to rewrite the IN predicate as a true semi-join, or better yet, an inner join. To achieve this, we need to convert the list of values to an inline table with only distinct values as the left shoulder of the inner join, and place the original table as the right shoulder. Thus, we could take advantage of the Indexed Nested Loop in case there is an index on a column that is part of the IN predicate.
Starting point for this ticket is org.apache.calcite.sql2rel.SqlToRelConverter#substituteSubQuery