Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.0.0
-
None
Description
2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://REDACTED:4041 2017-01-06T21:32:32,938 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.jvmGCTime 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.localBlocksFetched 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.resultSerializationTime 2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate( 364,WrappedArray()) 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.resultSize 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.peakExecutionMemory 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.fetchWaitTime 2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.memoryBytesSpilled 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.remoteBytesRead 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.diskBytesSpilled 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.localBytesRead 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.recordsRead 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.executorDeserializeTime 2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes 2017-01-06T21:32:32,941 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.executorRunTime 2017-01-06T21:32:32,941 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.remoteBlocksFetched 2017-01-06T21:32:32,943 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' closed. Now beginning upload 2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray()) 2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray()) 2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray())
Running spark on mesos, some large jobs fail to upload to the history server storage!
A successful sequence of events in the log that yield an upload are as follows:
2017-01-06T19:14:32,925 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp' 2017-01-06T21:59:14,789 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' closed. Now beginning upload 2017-01-06T21:59:44,679 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' upload complete
But large jobs do not ever get to the upload complete log message, and instead exit before completion.
Attachments
Issue Links
- depends upon
-
HADOOP-13560 S3ABlockOutputStream to support huge (many GB) file writes
- Resolved
-
SPARK-12330 Mesos coarse executor does not cleanup blockmgr properly on termination if data is stored on disk
- Resolved
-
SPARK-16333 Excessive Spark history event/json data size (5GB each)
- Resolved
- is related to
-
SPARK-11373 Add metrics to the History Server and providers
- Resolved