Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-7616

Column order can be corrupted when saving DataFrame as a partitioned table

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.4.0
    • 1.4.0
    • SQL
    • None

    Description

      When saved as a partitioned table, partition columns of a DataFrame are appended after data columns. However, column names are not adjusted accordingly.

      import sqlContext._
      import sqlContext.implicits._
      
      val df = (1 to 3).map(i => i -> i * 2).toDF("a", "b")
      
      df.write
        .format("parquet")
        .mode("overwrite")
        .partitionBy("a")
        .saveAsTable("t")
      
      table("t").orderBy('a).show()
      

      Expected output:

      +-+-+
      |b|a|
      +-+-+
      |2|1|
      |4|2|
      |6|3|
      +-+-+
      

      Actual output:

      +-+-+
      |b|a|
      +-+-+
      |1|2|
      |2|4|
      |3|6|
      +-+-+
      

      Attachments

        Activity

          People

            lian cheng Cheng Lian
            yhuai Yin Huai
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: