We could come in at a slightly lower level for this, bypassing the parser, and just generating an InListExpression directly.
Sure, we are gonna do that. It's just the re-compilation of other where clauses (aside from this IN clause) seems not so neat to me. But I'll just do it this way for now, since the overhead is not really gonna be a problem.
There is no doubt that the IN construct can handle the key mapping, but what I'm saying is that it is not sufficient in some cases. Suppose we have left table tuples (a, 1), (c, 2) and right table tuples (a, 3), (c, 4) and we perform a join on the first column but only select those columns from the left table. In this case, we can simply use the IN construct and we don't need that hash cache. But imagine we have another right table tuple (a, 5), the result should now be (a, 1), (a, 1), (c, 2), for there are two tuples matching "a" from the right table. In this latter case, we still have to keep the hash cache.
The nice thing about this approach is you'll be leveraging the way we optimize these IN expressions. The skip scan will just skip from row key to row key and be so much faster than a full table scan. It'll be a huge speedup for a relatively common case.
PHOENIX-889 is a very good example.