Description
Let's consider the following simple SQL query that reference an undefined function foo that is never registered in the function registry:
SELECT foo(a) FROM t
Assuming table t is a partitioned temporary view consisting of a large number of files stored on S3, it may take the analyzer a long time before realizing that foo is not registered yet.
The reason is that the existing analysis rule ResolveFunctions requires all child expressions to be resolved first. Therefore, ResolveRelations has to be executed first to resolve all columns referenced by the unresolved function invocation. This further leads to partition discovery for t, which may take a long time.
To address this case, we propose a new lightweight analysis rule LookupFunctions that
- Matches all unresolved function invocations
- Look up the function names from the function registry
- Report analysis error for any unregistered functions
Since this rule doesn't actually try to resolve the unresolved functions, it doesn't rely on ResolveRelations and therefore doesn't trigger partition discovery.
We may put this analysis rule in a separate Once rule batch that sits between the "Substitution" batch and the "Resolution" batch to avoid running it repeatedly and make sure it gets executed before ResolveRelations.
Attachments
Issue Links
- causes
-
SPARK-23486 LookupFunctions should not check the same function name more than once
- Resolved
- links to