Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6116 DataFrame API improvement umbrella ticket (Spark 1.5)
  3. SPARK-7506

pyspark.sql.types.StructType.fromJson() is incorrectly named

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Minor
    • Resolution: Won't Fix
    • 1.3.0, 1.3.1
    • None
    • PySpark, SQL
    • None
    • Spark 1.5 doc/QA sprint

    Description

      >>> json_rdd = sqlContext.jsonRDD(sc.parallelize(['{"name": "Nick"}']))
      >>> json_rdd.schema
      StructType(List(StructField(name,StringType,true)))
      >>> type(json_rdd.schema)
      <class 'pyspark.sql.types.StructType'>
      >>> json_rdd.schema.json()
      '{"fields":[{"metadata":{},"name":"name","nullable":true,"type":"string"}],"type":"struct"}'
      >>> pyspark.sql.types.StructType.fromJson(json_rdd.schema.json())
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Applications/apache-spark/spark-1.3.1-bin-hadoop2.4/python/pyspark/sql/types.py", line 346, in fromJson
          return StructType([StructField.fromJson(f) for f in json["fields"]])
      TypeError: string indices must be integers, not str
      >>> import json
      >>> pyspark.sql.types.StructType.fromJson(json.loads(json_rdd.schema.json()))
      StructType(List(StructField(name,StringType,true)))
      >>>
      

      So fromJson() doesn't actually expect JSON, which is a string. It expects a dictionary.

      This method should probably be renamed.

      Attachments

        Activity

          People

            Unassigned Unassigned
            nchammas Nicholas Chammas
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: