Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6116 DataFrame API improvement umbrella ticket (Spark 1.5)
  3. SPARK-7507

pyspark.sql.types.StructType and Row should implement __iter__()

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: PySpark, SQL
    • Labels:
      None
    • Sprint:
      Spark 1.5 doc/QA sprint

      Description

      StructType looks an awful lot like a Python dictionary.

      However, it doesn't implement __iter__(), so doing a quick conversion like this doesn't work:

      >>> df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
      >>> df.schema
      StructType(List(StructField(name,StringType,true)))
      >>> dict(df.schema)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: 'StructType' object is not iterable
      

      This would be super helpful for doing any custom schema manipulations without having to go through the whole .json() -> json.loads() -> manipulate() -> json.dumps() -> .fromJson() charade.

      Same goes for Row, which offers an asDict() method but doesn't support the more Pythonic dict(Row).

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              nchammas Nicholas Chammas
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: