[SPARK-41538] Metadata column should be appended at the end of project list - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.2, 3.4.0
Fix Version/s: 3.3.2, 3.4.0
Component/s: SQL
Labels:
None

Description

For the following query:

CREATE TABLE table_1 (
  a ARRAY<STRING>,
 s STRUCT<id: STRING>)
USING parquet;

CREATE VIEW view_1 (id)
AS WITH source AS (
    SELECT * FROM table_1
),
renamed AS (
    SELECT
     s.id
    FROM source
)
SELECT id FROM renamed;

with foo AS (
  SELECT 'a' as id
),
bar AS (
  SELECT 'a' as id
)
SELECT
  1
FROM foo
FULL OUTER JOIN bar USING(id)
FULL OUTER JOIN view_1 USING(id)
WHERE foo.id IS NOT NULL

There will be the following error:

class org.apache.spark.sql.types.ArrayType cannot be cast to class org.apache.spark.sql.types.StructType (org.apache.spark.sql.types.ArrayType and org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
java.lang.ClassCastException: class org.apache.spark.sql.types.ArrayType cannot be cast to class org.apache.spark.sql.types.StructType (org.apache.spark.sql.types.ArrayType and org.apache.spark.sql.types.StructType are in unnamed module of loader 'app')
    at org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema$lzycompute(complexTypeExtractors.scala:108)
    at org.apache.spark.sql.catalyst.expressions.GetStructField.childSchema(complexTypeExtractors.scala:108)
    at org.apache.spark.sql.catalyst.expressions.GetStructField.dataType(complexTypeExtractors.scala:114)
    at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:193)
    at org.apache.spark.sql.catalyst.expressions.AliasHelper$$anonfun$getAliasMap$1.applyOrElse(AliasHelper.scala:50)
    at org.apache.spark.sql.catalyst.expressions.AliasHelper$$anonfun$getAliasMap$1.applyOrElse(AliasHelper.scala:50)
    at scala.collection.immutable.List.collect(List.scala:315)
    at org.apache.spark.sql.catalyst.expressions.AliasHelper.getAliasMap(AliasHelper.scala:50)
    at org.apache.spark.sql.catalyst.expressions.AliasHelper.getAliasMap$(AliasHelper.scala:47)
    at org.apache.spark.sql.catalyst.optimizer.CollapseProject$.getAliasMap(Optimizer.scala:992)
    at org.apache.spark.sql.catalyst.optimizer.CollapseProject$.canCollapseExpressions(Optimizer.scala:1029)

This is caused by the inconsistent metadata column positions in the following two nodes:

Table relation: at the ending position
Project list: at the beginning position

When the InlineCTE rule executes, the metadata column in project is wrongly combined with the table output.

Attachments

Issue Links

links to

[Github] Pull Request #39081 (gengliangwang)

[Github] Pull Request #39425 (gengliangwang)

Activity

People

Assignee:: Gengliang Wang

Reporter:: Gengliang Wang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Dec/22 00:03

Updated:: 06/Jan/23 05:07

Resolved:: 16/Dec/22 07:43