Details
Description
>>> json_rdd = sqlContext.jsonRDD(sc.parallelize(['{"name": "Nick"}'])) >>> json_rdd.schema StructType(List(StructField(name,StringType,true))) >>> type(json_rdd.schema) <class 'pyspark.sql.types.StructType'> >>> json_rdd.schema.json() '{"fields":[{"metadata":{},"name":"name","nullable":true,"type":"string"}],"type":"struct"}' >>> pyspark.sql.types.StructType.fromJson(json_rdd.schema.json()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Applications/apache-spark/spark-1.3.1-bin-hadoop2.4/python/pyspark/sql/types.py", line 346, in fromJson return StructType([StructField.fromJson(f) for f in json["fields"]]) TypeError: string indices must be integers, not str >>> import json >>> pyspark.sql.types.StructType.fromJson(json.loads(json_rdd.schema.json())) StructType(List(StructField(name,StringType,true))) >>>
So fromJson() doesn't actually expect JSON, which is a string. It expects a dictionary.
This method should probably be renamed.