Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-3003

NullPointerExcetion on spark.read.json("hdfs://....") in Spark Standalone Cluster Mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.1, 0.7.3
    • None
    • Interpreters
    • None
      • Spark 2.1.1 with Standalone Cluster Manager
      • Zeppelin: tested with 0.7.1 as well as 0.7.2

    Description

      When running Zeppeling against a Spark Cluster with the Standalone Cluster Manager and running:

      val df = spark.read.option("inferSchema","false").json("hdfs://ip:port/path/file.txt")
      

      I'll get the following exception:

       WARN [2017-10-19 07:51:26,959] ({pool-2-thread-8} NotebookServer.java[afterStatusChange]:2064) - Job 20171016-144104_559309535 is finished, status: ERROR, exception: null, result: %text java.lang.NullPointerException
              at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
              at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
              at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:398)
              at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:387)
              at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
              at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:843)
              at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
              at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
              at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
              at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at java.lang.Thread.run(Thread.java:748)
      

      However, the following (i.e. .text() instead of .json()) works perfectly fine:

      val df = spark.read.option("inferSchema","false").json("hdfs://ip:port/path/file.txt")
      

      When I change the Spark-Master URI from

      spark://host1:7077,host2:7077,host3:7077
      

      to

      local[*]
      

      then both (.json() as well as .text()) work fine.
      So the json-file themselves are valid JSON since they're being parsed properly with a local Spark instance, but as soon as moving to the cluster mode, only text continues working and json throws a NullPointerException.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bveliqi Behar Veliqi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: