[SPARK-29748] Remove sorting of fields in PySpark SQL Row creation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: PySpark, SQL
Labels:
- release-notes

Description

Currently, when a PySpark Row is created with keyword arguments, the fields are sorted alphabetically. This has created a lot of confusion with users because it is not obvious (although it is stated in the pydocs) that they will be sorted alphabetically, and then an error can occur later when applying a schema and the field order does not match.

The original reason for sorting fields is because kwargs in python < 3.6 are not guaranteed to be in the same order that they were entered. Sorting alphabetically would ensure a consistent order. Matters are further complicated with the flag _from_dict_ that allows the Row fields to to be referenced by name when made by kwargs, but this flag is not serialized with the Row and leads to inconsistent behavior.

This JIRA proposes that any sorting of the Fields is removed. Users with Python 3.6+ creating Rows with kwargs can continue to do so since Python will ensure the order is the same as entered. Users with Python < 3.6 will have to create Rows with an OrderedDict or by using the Row class as a factory (explained in the pydoc). If kwargs are used, an error will be raised or based on a conf setting it can fall back to a LegacyRow that will sort the fields as before. This LegacyRow will be immediately deprecated and removed once support for Python < 3.6 is dropped.

Attachments

Issue Links

is related to

SPARK-22232 Row objects in pyspark created using the `Row(**kwars)` syntax do not get serialized/deserialized properly

Resolved

SPARK-24915 Calling SparkSession.createDataFrame with schema can throw exception

Resolved

SPARK-27939 Defining a schema with VectorUDT

Resolved

SPARK-27712 createDataFrame() reorders row

Closed

links to

GitHub Pull Request #26496

GitHub Pull Request #27573

(1 links to)

Activity

People

Assignee:: Bryan Cutler

Reporter:: Bryan Cutler

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 04/Nov/19 21:40

Updated:: 14/Feb/20 20:12

Resolved:: 10/Jan/20 22:39