Details
-
Bug
-
Status: Resolved
-
Resolution: Fixed
-
None
-
None
-
None
Description
Hi there,
When I try to run Spark on Mesos as a cluster, some error happen like this:
```
./run spark.examples.SparkPi ...:5050
12/09/07 14:49:28 INFO spark.BoundedMemoryCache: BoundedMemoryCache.maxBytes = 994836480
12/09/07 14:49:28 INFO spark.CacheTrackerActor: Registered actor on port 7077
12/09/07 14:49:28 INFO spark.CacheTrackerActor: Started slave cache (size 948.8MB) on shawpc
12/09/07 14:49:28 INFO spark.MapOutputTrackerActor: Registered actor on port 7077
12/09/07 14:49:28 INFO spark.ShuffleManager: Shuffle dir: /tmp/spark-local-81220c47-bc43-4809-ac48-5e3e8e023c8a/shuffle
12/09/07 14:49:28 INFO server.Server: jetty-7.5.3.v20111011
12/09/07 14:49:28 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:57595 STARTING
12/09/07 14:49:28 INFO spark.ShuffleManager: Local URI: http://127.0.1.1:57595
12/09/07 14:49:28 INFO server.Server: jetty-7.5.3.v20111011
12/09/07 14:49:28 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:60113 STARTING
12/09/07 14:49:28 INFO broadcast.HttpBroadcast: Broadcast server started at http://127.0.1.1:60113
12/09/07 14:49:28 INFO spark.MesosScheduler: Temp directory for JARs: /tmp/spark-d541f37c-ae35-476c-b2fc-9908b0739f50
12/09/07 14:49:28 INFO server.Server: jetty-7.5.3.v20111011
12/09/07 14:49:28 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:50511 STARTING
12/09/07 14:49:28 INFO spark.MesosScheduler: JAR server started at http://127.0.1.1:50511
12/09/07 14:49:28 INFO spark.MesosScheduler: Registered as framework ID 201209071448-846324308-5050-26925-0000
12/09/07 14:49:29 INFO spark.SparkContext: Starting job...
12/09/07 14:49:29 INFO spark.CacheTracker: Registering RDD ID 1 with cache
12/09/07 14:49:29 INFO spark.CacheTrackerActor: Registering RDD 1 with 2 partitions
12/09/07 14:49:29 INFO spark.CacheTracker: Registering RDD ID 0 with cache
12/09/07 14:49:29 INFO spark.CacheTrackerActor: Registering RDD 0 with 2 partitions
12/09/07 14:49:29 INFO spark.CacheTrackerActor: Asked for current cache locations
12/09/07 14:49:29 INFO spark.MesosScheduler: Final stage: Stage 0
12/09/07 14:49:29 INFO spark.MesosScheduler: Parents of final stage: List()
12/09/07 14:49:29 INFO spark.MesosScheduler: Missing parents: List()
12/09/07 14:49:29 INFO spark.MesosScheduler: Submitting Stage 0, which has no missing parents
12/09/07 14:49:29 INFO spark.MesosScheduler: Got a job with 2 tasks
12/09/07 14:49:29 INFO spark.MesosScheduler: Adding job with ID 0
12/09/07 14:49:29 INFO spark.SimpleJob: Starting task 0:0 as TID 0 on slave 201209071448-846324308-5050-26925-0: shawpc (preferred)
12/09/07 14:49:29 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 52 ms to serialize by spark.JavaSerializerInstance
12/09/07 14:49:29 INFO spark.SimpleJob: Starting task 0:1 as TID 1 on slave 201209071448-846324308-5050-26925-0: shawpc (preferred)
12/09/07 14:49:29 INFO spark.SimpleJob: Size of task 0:1 is 1606 bytes and took 1 ms to serialize by spark.JavaSerializerInstance
12/09/07 14:49:30 INFO spark.SimpleJob: Lost TID 0 (task 0:0)
12/09/07 14:49:30 INFO spark.SimpleJob: Starting task 0:0 as TID 2 on slave 201209071448-846324308-5050-26925-0: shawpc (preferred)
12/09/07 14:49:30 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 0 ms to serialize by spark.JavaSerializerInstance
12/09/07 14:49:30 INFO spark.SimpleJob: Lost TID 1 (task 0:1)
12/09/07 14:49:30 INFO spark.SimpleJob: Lost TID 2 (task 0:0)
12/09/07 14:49:30 INFO spark.SimpleJob: Starting task 0:0 as TID 3 on slave 201209071448-846324308-5050-26925-0: shawpc (preferred)
12/09/07 14:49:30 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 2 ms to serialize by spark.JavaSerializerInstance
12/09/07 14:49:32 INFO spark.SimpleJob: Starting task 0:1 as TID 4 on slave 201209071448-846324308-5050-26925-0: shawpc (preferred)
12/09/07 14:49:32 INFO spark.SimpleJob: Size of task 0:1 is 1606 bytes and took 1 ms to serialize by spark.JavaSerializerInstance
12/09/07 14:49:32 INFO spark.SimpleJob: Lost TID 3 (task 0:0)
12/09/07 14:49:32 INFO spark.SimpleJob: Starting task 0:0 as TID 5 on slave 201209071448-846324308-5050-26925-0: shawpc (preferred)
12/09/07 14:49:32 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 0 ms to serialize by spark.JavaSerializerInstance
12/09/07 14:49:32 INFO spark.SimpleJob: Lost TID 4 (task 0:1)
12/09/07 14:49:32 INFO spark.SimpleJob: Lost TID 5 (task 0:0)
12/09/07 14:49:32 INFO spark.SimpleJob: Starting task 0:0 as TID 6 on slave 201209071448-846324308-5050-26925-0: shawpc (preferred)
12/09/07 14:49:32 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 0 ms to serialize by spark.JavaSerializerInstance
12/09/07 14:49:34 INFO spark.SimpleJob: Starting task 0:1 as TID 7 on slave 201209071448-846324308-5050-26925-0: shawpc (preferred)
12/09/07 14:49:34 INFO spark.SimpleJob: Size of task 0:1 is 1606 bytes and took 2 ms to serialize by spark.JavaSerializerInstance
12/09/07 14:49:34 INFO spark.SimpleJob: Lost TID 6 (task 0:0)
12/09/07 14:49:34 ERROR spark.SimpleJob: Task 0:0 failed more than 4 times; aborting job
Exception in thread "Thread-50" java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2280)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2749)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:779)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:279)
at spark.JavaSerializerInstance$$anon$2.<init>(JavaSerializer.scala:39)
at spark.JavaSerializerInstance.deserialize(JavaSerializer.scala:39)
at spark.SimpleJob.taskLost(SimpleJob.scala:296)
at spark.SimpleJob.statusUpdate(SimpleJob.scala:207)
at spark.MesosScheduler.statusUpdate(MesosScheduler.scala:287)
12/09/07 14:49:34 INFO spark.SimpleJob: Starting task 0:0 as TID 8 on slave 201209071448-846324308-5050-26925-0: shawpc (preferred)
12/09/07 14:49:34 INFO spark.SimpleJob: Size of task 0:0 is 1606 bytes and took 1 ms to serialize by spark.JavaSerializerInstance
12/09/07 14:49:34 INFO spark.SimpleJob: Lost TID 7 (task 0:1)
12/09/07 14:49:34 INFO spark.MesosScheduler: driver.run() returned with code DRIVER_ABORTED
```