Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6116 DataFrame API improvement umbrella ticket (Spark 1.5)
  3. SPARK-7506

pyspark.sql.types.StructType.fromJson() is incorrectly named

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 1.3.0, 1.3.1
    • Fix Version/s: None
    • Component/s: PySpark, SQL
    • Labels:
      None
    • Sprint:
      Spark 1.5 doc/QA sprint

      Description

      >>> json_rdd = sqlContext.jsonRDD(sc.parallelize(['{"name": "Nick"}']))
      >>> json_rdd.schema
      StructType(List(StructField(name,StringType,true)))
      >>> type(json_rdd.schema)
      <class 'pyspark.sql.types.StructType'>
      >>> json_rdd.schema.json()
      '{"fields":[{"metadata":{},"name":"name","nullable":true,"type":"string"}],"type":"struct"}'
      >>> pyspark.sql.types.StructType.fromJson(json_rdd.schema.json())
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/Applications/apache-spark/spark-1.3.1-bin-hadoop2.4/python/pyspark/sql/types.py", line 346, in fromJson
          return StructType([StructField.fromJson(f) for f in json["fields"]])
      TypeError: string indices must be integers, not str
      >>> import json
      >>> pyspark.sql.types.StructType.fromJson(json.loads(json_rdd.schema.json()))
      StructType(List(StructField(name,StringType,true)))
      >>>
      

      So fromJson() doesn't actually expect JSON, which is a string. It expects a dictionary.

      This method should probably be renamed.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              nchammas Nicholas Chammas
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: