Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8296

Not able to load Dataframe using Python throws py4j.protocol.Py4JJavaError

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Done
    • 1.3.1
    • 1.3.1
    • PySpark, SQL
    • MAC OS

    Description

      While trying to load a json file using sqlcontext in prebuilt spark-1.3.1-bin-hadoop2.4 version, it throws py4j.protocol.Py4JJavaError

      from pyspark.sql import SQLContext
      from pyspark import SparkContext

      sc = SparkContext()
      sqlContext = SQLContext(sc)

      1. Create the DataFrame
        df = sqlContext.jsonFile("changes.json")
      1. Show the content of the DataFrame
        df.show()

      Error thrown -

      File "/Users/abhishekchoudhary/Work/python/evolveML/kaggle/avirto/test.py", line 11, in <module>
      df = sqlContext.jsonFile("changes.json")
      File "/Users/abhishekchoudhary/bigdata/cdh5.2.0/spark-1.3.1/python/pyspark/sql/context.py", line 377, in jsonFile
      df = self._ssql_ctx.jsonFile(path, samplingRatio)
      File "/Users/abhishekchoudhary/bigdata/cdh5.2.0/spark-1.3.1/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in _call_
      File "/Users/abhishekchoudhary/bigdata/cdh5.2.0/spark-1.3.1/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
      py4j.protocol.Py4JJavaError

      On checking through the source code, I found that 'gateway_client' is not valid .

      Attachments

        Activity

          People

            Unassigned Unassigned
            buntha ABHISHEK CHOUDHARY
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: