The FE fails to analyze a GROUP BY clause prior to invoking the rewrite rules, causing the rules to fail to do any rewrites.
For the SELECT list, the analyzer processes each expression and marks it as analyzed.
The rewrite rules, however, tend to skip unanalyzed nodes. (And, according to IMPALA-7754, often are not re-analyzed after a rewrite.)
Consider this simple query:
This query works. Now, using the new feature in
IMPALA-7655 with a query that will be rewritten to the above:
The above is rewritten using the new conditional function rewrite rules. Result:
The reason is the check used in multiple rewrite rules:
Step though the code. The coalesce() expression in the SELECT clause is analyzed, the one in the GROUP BY is not. This creates a problem because SQL semantics require the identical expression in both clause for them to match. (It also means no other rewrite rules, at least not those with this check, are invoked, leading to an unintended code path.)
This query makes it a bit clearer:
This works. But, if we use test code to inspect the "rewritten" GROUP BY, we find that it is still at "1 + 2" while the SELECT expression has been rewritten to "3".
Seems that, when working with rewrites, we must be very careful because, as the code currently is written, we rewrite some clauses but not others. Then, we have to know when it is safe to have the SELECT clause differ from the GROUP BY clause. (Looks like it is OK for constants to differ, but not for functions...)
VERY confusing, would be better to just fix the darn thing.