[SPARK-35876] array_zip unexpected column names - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.2
Fix Version/s: 3.2.0, 3.1.3, 3.0.4
Component/s: Spark Core
Labels:
None

Description

When I'm using the array_zip function in combination with renamed columns, I get an unexpected schema written to disk.

// code placeholder
from pyspark.sql import * 
from pyspark.sql.functions import *

spark = SparkSession.builder.getOrCreate()

data = [
  Row(a1=["a", "a"], b1=["b", "b"]),
]
df = (
  spark.sparkContext.parallelize(data).toDF()
    .withColumnRenamed("a1", "a2")
    .withColumnRenamed("b1", "b2")
    .withColumn("zipped", arrays_zip(col("a2"), col("b2")))
)
df.printSchema()
// root
//  |-- a2: array (nullable = true)
//  |    |-- element: string (containsNull = true)
//  |-- b2: array (nullable = true)
//  |    |-- element: string (containsNull = true)
//  |-- zipped: array (nullable = true)
//  |    |-- element: struct (containsNull = false)
//  |    |    |-- a2: string (nullable = true)
//  |    |    |-- b2: string (nullable = true)

df.write.save("test.parquet")
spark.read.load("test.parquet").printSchema()
// root
//  |-- a2: array (nullable = true)
//  |    |-- element: string (containsNull = true)
//  |-- b2: array (nullable = true)
//  |    |-- element: string (containsNull = true)
//  |-- zipped: array (nullable = true)
//  |    |-- element: struct (containsNull = true)
//  |    |    |-- a1: string (nullable = true)
//  |    |    |-- b1: string (nullable = true)

I would expect the schema of the DataFrame written to disk to be the same as that printed out. It seems that instead of using the renamed version of the column names, it uses the old column names.

Attachments

Issue Links

links to

[Github] Pull Request #33106 (sarutak)

[Github] Pull Request #33810 (AngersZhuuuu)

Activity

People

Assignee:: Kousuke Saruta

Reporter:: Derk Crezee

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 24/Jun/21 11:53

Updated:: 12/Dec/22 18:10

Resolved:: 29/Jun/21 03:29