Details
-
Bug
-
Status: Reopened
-
Minor
-
Resolution: Unresolved
-
Impala 3.0
-
None
-
None
-
ghx-label-6
Description
The FE fails to analyze a GROUP BY clause prior to invoking the rewrite rules, causing the rules to fail to do any rewrites.
For the SELECT list, the analyzer processes each expression and marks it as analyzed.
The rewrite rules, however, tend to skip unanalyzed nodes. (And, according to IMPALA-7754, often are not re-analyzed after a rewrite.)
Consider this simple query:
SELECT case when string_col is not null then string_col else 'foo' end FROM functional.alltypestiny GROUP BY case when string_col is not null then string_col else 'foo' end
This query works. Now, using the new feature in IMPALA-7655 with a query that will be rewritten to the above:
SELECT coalesce(string_col, 'foo') FROM functional.alltypes GROUP BY coalesce(string_col, 'foo')
The above is rewritten using the new conditional function rewrite rules. Result:
org.apache.impala.common.AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
The reason is the check used in multiple rewrite rules:
public Expr apply(Expr expr, Analyzer analyzer) throws AnalysisException { if (!expr.isAnalyzed()) return expr;
Step though the code. The coalesce() expression in the SELECT clause is analyzed, the one in the GROUP BY is not. This creates a problem because SQL semantics require the identical expression in both clause for them to match. (It also means no other rewrite rules, at least not those with this check, are invoked, leading to an unintended code path.)
This query makes it a bit clearer:
SELECT 1 + 2 FROM functional.alltypestiny GROUP BY 1 + 2
This works. But, if we use test code to inspect the "rewritten" GROUP BY, we find that it is still at "1 + 2" while the SELECT expression has been rewritten to "3".
Seems that, when working with rewrites, we must be very careful because, as the code currently is written, we rewrite some clauses but not others. Then, we have to know when it is safe to have the SELECT clause differ from the GROUP BY clause. (Looks like it is OK for constants to differ, but not for functions...)
VERY confusing, would be better to just fix the darn thing.
Attachments
Issue Links
- is part of
-
IMPALA-7831 Revisit expression rewriting integration with planner
-
- Open
-
-
IMPALA-7747 Clean up the Expression Rewriter
-
- Open
-
- relates to
-
IMPALA-7083 AnalysisException for GROUP BY and ORDER BY expressions that are folded to constants from 2.9 onwards
-
- Open
-