Description
Current implementation of Row's _new_ sorts columns by name
First of all there is no obvious reason to sort, second, if one converts dataframe to rdd and than back to dataframe, order of column changes. While this is not a bug, nevetheless it makes looking at the data really inconvenient.
def _new_(self, *args, **kwargs):
if args and kwargs:
raise ValueError("Can not use both args "
"and kwargs to create Row")
if args:
- create row class or objects
return tuple._new_(self, args)
elif kwargs:
- create row objects
names = sorted(kwargs.keys()) # just get rid of sorting here!!!
row = tuple._new_(self, [kwargs[n] for n in names])
row._fields_ = names
return row
else:
raise ValueError("No args or kwargs")
Attachments
Issue Links
- is duplicated by
-
SPARK-13802 Fields order in Row(**kwargs) is not consistent with Schema.toInternal method
- Resolved
-
SPARK-20527 Schema issues when fields are queries in different order
- Resolved