Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21538

Attribute resolution inconsistency in Dataset API

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.2.1, 2.3.0
    • Component/s: SQL
    • Labels:
      None

      Description

      spark.range(1).withColumnRenamed("id", "x").sort(col("id"))  // works
      spark.range(1).withColumnRenamed("id", "x").sort($"id")  // works
      spark.range(1).withColumnRenamed("id", "x").sort('id) // works
      spark.range(1).withColumnRenamed("id", "x").sort("id") // fails with:
      org.apache.spark.sql.AnalysisException: Cannot resolve column name "id" among (x);
      ...
      

      It looks like the Dataset API functions taking String use the basic resolver that only look at the columns at that level, whereas all the other means of expressing an attribute are lazily resolved during the analyzer.

      The reason why the first 3 calls work is explained in the docs for object ResolveMissingReferences:

        /**
         * In many dialects of SQL it is valid to sort by attributes that are not present in the SELECT
         * clause.  This rule detects such queries and adds the required attributes to the original
         * projection, so that they will be available during sorting. Another projection is added to
         * remove these attributes after sorting.
         *
         * The HAVING clause could also used a grouping columns that is not presented in the SELECT.
         */
      

      For consistency, it would be good to use the same attribute resolution mechanism everywhere.

        Attachments

          Activity

            People

            • Assignee:
              aokolnychyi Anton Okolnychyi
              Reporter:
              a.ionescu Adrian Ionescu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: