Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41538

Metadata column should be appended at the end of project list

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.2, 3.4.0
    • 3.3.2, 3.4.0
    • SQL
    • None

    Description

      For the following query:

       

      CREATE TABLE table_1 (
        a ARRAY<STRING>,
       s STRUCT<id: STRING>)
      USING parquet;
      
      CREATE VIEW view_1 (id)
      AS WITH source AS (
          SELECT * FROM table_1
      ),
      renamed AS (
          SELECT
           s.id
          FROM source
      )
      SELECT id FROM renamed;
      
      with foo AS (
        SELECT 'a' as id
      ),
      bar AS (
        SELECT 'a' as id
      )
      SELECT
        1
      FROM foo
      FULL OUTER JOIN bar USING(id)
      FULL OUTER JOIN view_1 USING(id)
      WHERE foo.id IS NOT NULL

      There will be the following error:

       

      class org.apache.spark.sql.types.ArrayType cannot be cast to class org.apache.spark.sql.types.StructType (org.apache.spark.sql.types.ArrayType and org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
      java.lang.ClassCastException: class org.apache.spark.sql.types.ArrayType cannot be cast to class org.apache.spark.sql.types.StructType (org.apache.spark.sql.types.ArrayType and org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
          at org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema$lzycompute(complexTypeExtractors.scala:108)
          at org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema(complexTypeExtractors.scala:108)
          at org.apache.spark.sql.catalyst.expressions.GetStructField.dataType(complexTypeExtractors.scala:114)
          at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:193)
          at org.apache.spark.sql.catalyst.expressions.AliasHelper$$anonfun$getAliasMap$1.applyOrElse(AliasHelper.scala:50)
          at org.apache.spark.sql.catalyst.expressions.AliasHelper$$anonfun$getAliasMap$1.applyOrElse(AliasHelper.scala:50)
          at scala.collection.immutable.List.collect(List.scala:315)
          at org.apache.spark.sql.catalyst.expressions.AliasHelper.getAliasMap(AliasHelper.scala:50)
          at org.apache.spark.sql.catalyst.expressions.AliasHelper.getAliasMap$(AliasHelper.scala:47)
          at org.apache.spark.sql.catalyst.optimizer.CollapseProject$.getAliasMap(Optimizer.scala:992)
          at org.apache.spark.sql.catalyst.optimizer.CollapseProject$.canCollapseExpressions(Optimizer.scala:1029)

      This is caused by the inconsistent metadata column positions in the following two nodes:

      • Table relation: at the ending position
      • Project list: at the beginning position

      When the InlineCTE rule executes, the metadata column in project is wrongly combined with the table output.

       

       

       

      Attachments

        Activity

          People

            Gengliang.Wang Gengliang Wang
            Gengliang.Wang Gengliang Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: