Description
In PySpark, you can call {{.asDict
()}} on a SparkSQL Row to convert it to a dictionary. Unfortunately, though, this does not convert nested rows to dictionaries. For example:
>>> sqlContext.sql("select results from results").first() Row(results=[Row(time=3.762), Row(time=3.47), Row(time=3.559), Row(time=3.458), Row(time=3.229), Row(time=3.21), Row(time=3.166), Row(time=3.276), Row(time=3.239), Row(time=3.149)]) >>> sqlContext.sql("select results from results").first().asDict() {u'results': [(3.762,), (3.47,), (3.559,), (3.458,), (3.229,), (3.21,), (3.166,), (3.276,), (3.239,), (3.149,)]}
Actually, it looks like the nested fields are just left as Rows (IPython's fancy display logic obscured this in my first example):
>>> Row(results=[Row(time=1), Row(time=2)]).asDict()
{'results': [Row(time=1), Row(time=2)]}
Here's the output I'd expect:
>>> Row(results=[Row(time=1), Row(time=2)]) {'results' : [{'time': 1}, {'time': 2}]}
I ran into this issue when trying to use Pandas dataframes to display nested data that I queried from Spark SQL.
Attachments
Issue Links
- relates to
-
SPARK-4051 Rows in python should support conversion to dictionary
- Resolved
- links to