Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11751

Doc describe error in the "Spark Streaming Programming Guide" page

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 1.4.1, 1.5.0, 1.5.1, 1.5.2
    • 1.6.0
    • Documentation
    • None

    Description

      In the Task Launching Overheads section,

      Task Serialization: Using Kryo serialization for serializing tasks can reduce the task sizes, and therefore reduce the time taken to send them to the slaves.

      As we known Task Serialization is configuration by spark.closure.serializer parameter, but currently only the Java serializer is supported. If we set spark.closure.serializer to org.apache.spark.serializer.KryoSerializer, then this will throw a exception as follow:

      org.apache.spark.SparkException: Job aborted due to stage failure: Task 516 in stage 0.0 failed 4 times, most recent failure: Lost task 516.3 in stage 0.0 (TID 21, spark-cluster.data.com): java.io.EOFException
      	at java.io.DataInputStream.readInt(DataInputStream.java:392)
      	at org.apache.spark.scheduler.Task$.deserializeWithDependencies(Task.scala:188)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:192)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:744)
      
      Driver stacktrace:
      	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
      	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
      	at scala.Option.foreach(Option.scala:236)
      	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
      	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      

      Attachments

        Activity

          People

            397090770 iteblog
            397090770 iteblog
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: