[SPARK-13802] Fields order in Row(**kwargs) is not consistent with Schema.toInternal method - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 1.6.0
Fix Version/s: None
Component/s: PySpark
Labels:
None

Description

When using Row constructor from kwargs, fields in the tuple underneath are sorted by name. When Schema is reading the row, it is not using the fields in this order.

from pyspark.sql import Row
from pyspark.sql.types import *

schema = StructType([
    StructField("id", StringType()),
    StructField("first_name", StringType())])
row = Row(id="39", first_name="Szymon")
schema.toInternal(row)
Out[5]: ('Szymon', '39')

df = sqlContext.createDataFrame([row], schema)
df.show(1)

+------+----------+
|    id|first_name|
+------+----------+
|Szymon|        39|
+------+----------+

Attachments

Issue Links

duplicates

SPARK-12467 Get rid of sorting in Row's constructor in pyspark

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Szymon Matejczyk

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 10/Mar/16 13:55

Updated:: 12/Dec/22 18:10

Resolved:: 03/May/17 12:12