[SPARK-12467] Get rid of sorting in Row's constructor in pyspark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Won't Fix
Affects Version/s: 1.5.2, 2.2.0
Fix Version/s: None
Component/s: PySpark, SQL
Labels:
None

Description

Current implementation of Row's _new_ sorts columns by name
First of all there is no obvious reason to sort, second, if one converts dataframe to rdd and than back to dataframe, order of column changes. While this is not a bug, nevetheless it makes looking at the data really inconvenient.

def _new_(self, *args, **kwargs):
if args and kwargs:
raise ValueError("Can not use both args "
"and kwargs to create Row")
if args:

create row class or objects
return tuple._new_(self, args)

elif kwargs:

create row objects
names = sorted(kwargs.keys()) # just get rid of sorting here!!!
row = tuple._new_(self, [kwargs[n] for n in names])
row._fields_ = names
return row

else:
raise ValueError("No args or kwargs")

Attachments

Issue Links

is duplicated by

SPARK-13802 Fields order in Row(**kwargs) is not consistent with Schema.toInternal method

Resolved

SPARK-20527 Schema issues when fields are queries in different order

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Irakli Machabeli

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/Dec/15 19:52

Updated:: 12/Dec/22 18:10

Resolved:: 04/May/17 15:36