Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Duplicate
-
1.0.0, 1.0.2
-
None
-
None
Description
Stage to resubmit more than 50000 times.
This seems to be caused by FetchFailed.bmAddress is null .
I don't know how to reproduce it.
master log:
14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:276 as TID 52334 on executor 82: sanshan (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:276 as 3060 bytes in 0 ms 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Starting task 1.189:277 as TID 52335 on executor 78: tuan231 (PROCESS_LOCAL) 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Serialized task 1.189:277 as 3060 bytes in 0 ms 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Lost TID 52199 (task 1.189:141) 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 WARN scheduler.TaskSetManager: Loss was due to fetch failure from null 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission ------------------ 50000 times ------------------- 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Marking Stage 1 (distinct at DealCF.scala:215) for resubmision due to a fetch failure 14/08/09 21:50:17 INFO scheduler.DAGScheduler: The failed fetch was from Stage 2 (flatMap at DealCF.scala:207); marking it for resubmission 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 1.189, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.TaskSetManager: Finished TID 1869 in 87398 ms on jilin (progress: 280/280) 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Completed ShuffleMapTask(2, 269) 14/08/09 21:50:17 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 2.1, whose tasks have all completed, from pool 14/08/09 21:50:17 INFO scheduler.DAGScheduler: Stage 2 (flatMap at DealCF.scala:207) finished in 129.544 s
worker: log
/1408/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_57 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_191 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18017 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18017 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18151 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_86 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_220 not found, computing it 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18285 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18285 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18419 14/08/09 21:49:41 INFO executor.Executor: Running task ID 18419 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:41 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_86 not found, computing it 14/08/09 21:49:41 INFO spark.CacheManager: Partition rdd_23_220 not found, computing it 14/08/09 21:49:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18535 14/08/09 21:49:42 INFO executor.Executor: Running task ID 18535 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18669 14/08/09 21:49:42 INFO executor.Executor: Running task ID 18669 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:42 INFO spark.CacheManager: Partition rdd_23_68 not found, computing it 14/08/09 21:49:42 INFO spark.CacheManager: Partition rdd_23_202 not found, computing it 14/08/09 21:49:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18787 14/08/09 21:49:42 INFO executor.Executor: Running task ID 18787 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 18921 14/08/09 21:49:42 INFO executor.Executor: Running task ID 18921 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:42 INFO spark.CacheManager: Partition rdd_23_52 not found, computing it 14/08/09 21:49:42 INFO spark.CacheManager: Partition rdd_23_186 not found, computing it 14/08/09 21:49:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 19012 14/08/09 21:49:42 INFO executor.Executor: Running task ID 19012 14/08/09 21:49:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 19146 14/08/09 21:49:42 INFO executor.Executor: Running task ID 19146 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:42 INFO spark.CacheManager: Partition rdd_23_9 not found, computing it 14/08/09 21:49:42 INFO spark.CacheManager: Partition rdd_23_143 not found, computing it 14/08/09 21:49:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 19351 14/08/09 21:49:42 INFO executor.Executor: Running task ID 19351 14/08/09 21:49:42 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 19484 14/08/09 21:49:42 INFO executor.Executor: Running task ID 19484 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:42 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:42 INFO spark.CacheManager: Partition rdd_23_80 not found, computing it 14/08/09 21:49:42 INFO spark.CacheManager: Partition rdd_23_213 not found, computing it 14/08/09 21:49:43 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 19668 14/08/09 21:49:43 INFO executor.Executor: Running task ID 19668 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:43 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 19801 14/08/09 21:49:43 INFO executor.Executor: Running task ID 19801 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:43 INFO spark.CacheManager: Partition rdd_23_128 not found, computing it 14/08/09 21:49:43 INFO spark.CacheManager: Partition rdd_23_261 not found, computing it 14/08/09 21:49:43 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 19826 14/08/09 21:49:43 INFO executor.Executor: Running task ID 19826 14/08/09 21:49:43 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 19958 14/08/09 21:49:43 INFO executor.Executor: Running task ID 19958 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:43 INFO spark.CacheManager: Partition rdd_23_17 not found, computing it 14/08/09 21:49:43 INFO spark.CacheManager: Partition rdd_23_149 not found, computing it 14/08/09 21:49:43 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 20129 14/08/09 21:49:43 INFO executor.Executor: Running task ID 20129 14/08/09 21:49:43 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 20262 14/08/09 21:49:43 INFO executor.Executor: Running task ID 20262 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:43 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:43 INFO spark.CacheManager: Partition rdd_23_184 not found, computing it 14/08/09 21:49:43 INFO spark.CacheManager: Partition rdd_23_51 not found, computing it 14/08/09 21:49:44 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 20386 14/08/09 21:49:44 INFO executor.Executor: Running task ID 20386 14/08/09 21:49:44 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 20520 14/08/09 21:49:44 INFO executor.Executor: Running task ID 20520 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:44 INFO spark.CacheManager: Partition rdd_23_173 not found, computing it 14/08/09 21:49:44 INFO spark.CacheManager: Partition rdd_23_39 not found, computing it 14/08/09 21:49:44 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 20618 14/08/09 21:49:44 INFO executor.Executor: Running task ID 20618 14/08/09 21:49:44 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 20752 14/08/09 21:49:44 INFO executor.Executor: Running task ID 20752 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_1 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_2 locally 14/08/09 21:49:44 INFO storage.BlockManager: Found block broadcast_0 locally 14/08/09 21:49:44 INFO spark.CacheManager: Partition rdd_23_135 not found, computing it
Attachments
Issue Links
- duplicates
-
SPARK-3224 FetchFailed stages could show up multiple times in failed stages in web ui
- Resolved
- is duplicated by
-
SPARK-3224 FetchFailed stages could show up multiple times in failed stages in web ui
- Resolved
- links to