[SPARK-7616] Column order can be corrupted when saving DataFrame as a partitioned table - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.4.0
Fix Version/s: 1.4.0
Component/s: SQL
Labels:
None

Target Version/s:

1.4.0

Description

When saved as a partitioned table, partition columns of a DataFrame are appended after data columns. However, column names are not adjusted accordingly.

import sqlContext._
import sqlContext.implicits._

val df = (1 to 3).map(i => i -> i * 2).toDF("a", "b")

df.write
  .format("parquet")
  .mode("overwrite")
  .partitionBy("a")
  .saveAsTable("t")

table("t").orderBy('a).show()

Expected output:

+-+-+
|b|a|
+-+-+
|2|1|
|4|2|
|6|3|
+-+-+

Actual output:

+-+-+
|b|a|
+-+-+
|1|2|
|2|4|
|3|6|
+-+-+

Attachments

Issue Links

links to

[Github] Pull Request #6285 (liancheng)

Activity

People

Assignee:: Cheng Lian

Reporter:: Yin Huai

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/May/15 21:17

Updated:: 21/May/15 20:52

Resolved:: 21/May/15 20:52