[SPARK-4561] PySparkSQL's Row.asDict() should convert nested rows to dictionaries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.5.0
Component/s: PySpark, SQL
Labels:
None

Target Version/s:

1.5.0

Description

In PySpark, you can call {{.asDict
()}} on a SparkSQL Row to convert it to a dictionary. Unfortunately, though, this does not convert nested rows to dictionaries. For example:

>>> sqlContext.sql("select results from results").first()
Row(results=[Row(time=3.762), Row(time=3.47), Row(time=3.559), Row(time=3.458), Row(time=3.229), Row(time=3.21), Row(time=3.166), Row(time=3.276), Row(time=3.239), Row(time=3.149)])
>>> sqlContext.sql("select results from results").first().asDict()
{u'results': [(3.762,),
  (3.47,),
  (3.559,),
  (3.458,),
  (3.229,),
  (3.21,),
  (3.166,),
  (3.276,),
  (3.239,),
  (3.149,)]}

Actually, it looks like the nested fields are just left as Rows (IPython's fancy display logic obscured this in my first example):

>>> Row(results=[Row(time=1), Row(time=2)]).asDict()
{'results': [Row(time=1), Row(time=2)]}

Here's the output I'd expect:

>>> Row(results=[Row(time=1), Row(time=2)])
{'results' : [{'time': 1}, {'time': 2}]}

I ran into this issue when trying to use Pandas dataframes to display nested data that I queried from Spark SQL.

Attachments

Issue Links

relates to

SPARK-4051 Rows in python should support conversion to dictionary

Resolved

links to

[Github] Pull Request #8006 (davies)

Activity

People

Assignee:: Davies Liu

Reporter:: Josh Rosen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 23/Nov/14 04:07

Updated:: 08/Aug/15 15:37

Resolved:: 08/Aug/15 15:37