Description
While trying to load a json file using sqlcontext in prebuilt spark-1.3.1-bin-hadoop2.4 version, it throws py4j.protocol.Py4JJavaError
from pyspark.sql import SQLContext
from pyspark import SparkContext
sc = SparkContext()
sqlContext = SQLContext(sc)
- Create the DataFrame
df = sqlContext.jsonFile("changes.json")
- Show the content of the DataFrame
df.show()
Error thrown -
File "/Users/abhishekchoudhary/Work/python/evolveML/kaggle/avirto/test.py", line 11, in <module>
df = sqlContext.jsonFile("changes.json")
File "/Users/abhishekchoudhary/bigdata/cdh5.2.0/spark-1.3.1/python/pyspark/sql/context.py", line 377, in jsonFile
df = self._ssql_ctx.jsonFile(path, samplingRatio)
File "/Users/abhishekchoudhary/bigdata/cdh5.2.0/spark-1.3.1/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in _call_
File "/Users/abhishekchoudhary/bigdata/cdh5.2.0/spark-1.3.1/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError
On checking through the source code, I found that 'gateway_client' is not valid .