Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19737

New analysis rule for reporting unregistered functions without relying on relation resolution

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.0
    • SQL
    • None

    Description

      Let's consider the following simple SQL query that reference an undefined function foo that is never registered in the function registry:

      SELECT foo(a) FROM t
      

      Assuming table t is a partitioned temporary view consisting of a large number of files stored on S3, it may take the analyzer a long time before realizing that foo is not registered yet.

      The reason is that the existing analysis rule ResolveFunctions requires all child expressions to be resolved first. Therefore, ResolveRelations has to be executed first to resolve all columns referenced by the unresolved function invocation. This further leads to partition discovery for t, which may take a long time.

      To address this case, we propose a new lightweight analysis rule LookupFunctions that

      1. Matches all unresolved function invocations
      2. Look up the function names from the function registry
      3. Report analysis error for any unregistered functions

      Since this rule doesn't actually try to resolve the unresolved functions, it doesn't rely on ResolveRelations and therefore doesn't trigger partition discovery.

      We may put this analysis rule in a separate Once rule batch that sits between the "Substitution" batch and the "Resolution" batch to avoid running it repeatedly and make sure it gets executed before ResolveRelations.

      Attachments

        Issue Links

          Activity

            People

              lian cheng Cheng Lian
              lian cheng Cheng Lian
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: