Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36867

Misleading Error Message with Invalid Column and Group By

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.2
    • 3.3.0
    • SQL
    • None

    Description

      When you run a query with an invalid column that also does a group by on a constructed column, the error message you get back references a missing column for the group by rather than the invalid column.

      You can reproduce this in pyspark in 3.1.2 with the following code:

      from pyspark.sql import SparkSession
      spark = SparkSession.builder.appName("Group By Issue").getOrCreate()
      data = spark.createDataFrame(
          [("2021-09-15", 1), ("2021-09-16", 2), ("2021-09-17", 10), ("2021-09-18", 25), ("2021-09-19", 500), ("2021-09-20", 50), ("2021-09-21", 100)],
          schema=["d", "v"]
          )
      data.createOrReplaceTempView("data")
      # This is valid
      spark.sql("select sum(v) as value, date(date_trunc('week', d)) as week from data group by week").show()
      # This is invalid because val is the wrong variable
      spark.sql("select sum(val) as value, date(date_trunc('week', d)) as week from data group by week").show()
      

      The error message for the second spark.sql line is

      pyspark.sql.utils.AnalysisException: cannot resolve '`week`' given input columns: [data.d, data.v]; line 1 pos 81;
      'Aggregate ['week], 'sum('val) AS value#21, cast(date_trunc(week, cast(d#0 as timestamp), Some(America/New_York)) as date) AS week#22
      +- SubqueryAlias data
      +- LogicalRDD d#0, v#1L, false

      but the actual problem is that I used the wrong variable name in a different part of the query. Nothing is wrong with week in this case.

      Attachments

        Activity

          People

            cloud_fan Wenchen Fan
            jackowaya Alan Jackoway
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: