Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6116 DataFrame API improvement umbrella ticket (Spark 1.5)
  3. SPARK-7507

pyspark.sql.types.StructType and Row should implement __iter__()

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • None
    • None
    • PySpark, SQL
    • None
    • Spark 1.5 doc/QA sprint

    Description

      StructType looks an awful lot like a Python dictionary.

      However, it doesn't implement __iter__(), so doing a quick conversion like this doesn't work:

      >>> df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
      >>> df.schema
      StructType(List(StructField(name,StringType,true)))
      >>> dict(df.schema)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: 'StructType' object is not iterable
      

      This would be super helpful for doing any custom schema manipulations without having to go through the whole .json() -> json.loads() -> manipulate() -> json.dumps() -> .fromJson() charade.

      Same goes for Row, which offers an asDict() method but doesn't support the more Pythonic dict(Row).

      Attachments

        Activity

          People

            Unassigned Unassigned
            nchammas Nicholas Chammas
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: