Basically, hash-join is found limited for 3 patterns in terms of functionality:
1) Full joins, where both sides are the "OUTER" side
2) "A RIGHT JOIN B LEFT JOIN C"
3) "A RIGHT JOIN (complicated subquery or sub-join)" or "(complicated subquery or sub-join) LEFT JOIN A" or "(complicated subquery or sub-join) INNER JOIN A"
complicated subquery means those subqueries that cannot be flattened into a simple scan plan and thus cannot be the LHS table (main table), such as a group-by, a limit, a sub-join, etc.
Currently hash-join has the priority. So if no hint is provided, the compiler see if the top-level join can be done using hash-join, and recursively go through its sub-joins, and whenever we hit the above three patterns, the sort-merge-join plans are used. Again take the example (pattern 2) I wrote in my first comment:
"A right join B left join C"
First, it does not qualify for a star join (which should not contain any right join), so it should be compiled in a sequence like (A right join B) left join C, and now we find that the LHS table is not a flat scan, so we have to compile the left join as sort-merge join but we can still do hash-join for (A right join B).
So far, we don't have a way to hint using certain join algorithms for a specific join in a query. But I think it might be worth adding such facilities.
The orderby is pushed down to join tables in a query rewrite way, so order-by optimizations can be automatically used and of course will have server-side sorting. For sub-queries used as join tables that has order-by itself, the original order-by will be overridden by the sort-merge order by. The outmost order-by will just remain intact.
would be interesting to get your feedback on what we need in terms of statistics to drive the optimizer decisions
Sure, definitely. Already had some discussions with ramkrishna.s.vasudevan.