[SPARK-7507] pyspark.sql.types.StructType and Row should implement __iter__() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Minor
Resolution: Won't Fix
Affects Version/s: None
Fix Version/s: None
Component/s: PySpark, SQL
Labels:
None

Sprint:
Spark 1.5 doc/QA sprint

Description

StructType looks an awful lot like a Python dictionary.

However, it doesn't implement __iter__(), so doing a quick conversion like this doesn't work:

>>> df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
>>> df.schema
StructType(List(StructField(name,StringType,true)))
>>> dict(df.schema)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'StructType' object is not iterable

This would be super helpful for doing any custom schema manipulations without having to go through the whole .json() -> json.loads() -> manipulate() -> json.dumps() -> .fromJson() charade.

Same goes for Row, which offers an asDict() method but doesn't support the more Pythonic dict(Row).

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Nicholas Chammas

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/May/15 18:09

Updated:: 08/Jul/15 23:58

Resolved:: 08/Jul/15 23:58