At compile time, we know how many guideposts (i.e. how many bytes) will be scanned for the RHS table. We should, by default, base the decision of using the hash-join verus many-to-many join on this information.
Another criteria (as we've seen in
PHOENIX-4508) is whether or not the tables being joined are already ordered by the join key. In that case, it's better to always use the sort merge join.