Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35876

array_zip unexpected column names

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.1.2
    • 3.2.0, 3.1.3, 3.0.4
    • Spark Core
    • None

    Description

      When I'm using the array_zip function in combination with renamed columns, I get an unexpected schema written to disk.

      // code placeholder
      from pyspark.sql import * 
      from pyspark.sql.functions import *
      
      spark = SparkSession.builder.getOrCreate()
      
      data = [
        Row(a1=["a", "a"], b1=["b", "b"]),
      ]
      df = (
        spark.sparkContext.parallelize(data).toDF()
          .withColumnRenamed("a1", "a2")
          .withColumnRenamed("b1", "b2")
          .withColumn("zipped", arrays_zip(col("a2"), col("b2")))
      )
      df.printSchema()
      // root
      //  |-- a2: array (nullable = true)
      //  |    |-- element: string (containsNull = true)
      //  |-- b2: array (nullable = true)
      //  |    |-- element: string (containsNull = true)
      //  |-- zipped: array (nullable = true)
      //  |    |-- element: struct (containsNull = false)
      //  |    |    |-- a2: string (nullable = true)
      //  |    |    |-- b2: string (nullable = true)
      
      df.write.save("test.parquet")
      spark.read.load("test.parquet").printSchema()
      // root
      //  |-- a2: array (nullable = true)
      //  |    |-- element: string (containsNull = true)
      //  |-- b2: array (nullable = true)
      //  |    |-- element: string (containsNull = true)
      //  |-- zipped: array (nullable = true)
      //  |    |-- element: struct (containsNull = true)
      //  |    |    |-- a1: string (nullable = true)
      //  |    |    |-- b1: string (nullable = true)

      I would expect the schema of the DataFrame written to disk to be the same as that printed out. It seems that instead of using the renamed version of the column names, it uses the old column names.

       

      Attachments

        Activity

          People

            sarutak Kousuke Saruta
            dcrezee Derk Crezee
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: