Details
Description
`LocalityPlacementStrategySuite` hangs sometimes like the following. We can retriever, but it takes our resource significantly because it hangs until the timeout (6 hours) occurs.
https://github.com/apache/spark/runs/1719480243
https://github.com/apache/spark/runs/1724459002
https://github.com/apache/spark/runs/1717958874
https://github.com/apache/spark/runs/1731673955 (branch-3.0)
[info] LocalityPlacementStrategySuite: 17299[info] *** Test still running after 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17300[info] *** Test still running after 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17301[info] *** Test still running after 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17302[info] *** Test still running after 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17303[info] *** Test still running after 23 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17304[info] *** Test still running after 28 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17305[info] *** Test still running after 33 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17306[info] *** Test still running after 38 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17307[info] *** Test still running after 43 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17308[info] *** Test still running after 48 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17309[info] *** Test still running after 53 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17310[info] *** Test still running after 58 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17311[info] *** Test still running after 1 hour, 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17312[info] *** Test still running after 1 hour, 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17313[info] *** Test still running after 1 hour, 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17314[info] *** Test still running after 1 hour, 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17315[info] *** Test still running after 1 hour, 23 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17316[info] *** Test still running after 1 hour, 28 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17317[info] *** Test still running after 1 hour, 33 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17318[info] *** Test still running after 1 hour, 38 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17319[info] *** Test still running after 1 hour, 43 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17320[info] *** Test still running after 1 hour, 48 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17321[info] *** Test still running after 1 hour, 53 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17322[info] *** Test still running after 1 hour, 58 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17323[info] *** Test still running after 2 hours, 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17324[info] *** Test still running after 2 hours, 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17325[info] *** Test still running after 2 hours, 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17326[info] *** Test still running after 2 hours, 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17327[info] *** Test still running after 2 hours, 23 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17328[info] *** Test still running after 2 hours, 28 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17329[info] *** Test still running after 2 hours, 33 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17330[info] *** Test still running after 2 hours, 38 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17331[info] *** Test still running after 2 hours, 43 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17332[info] *** Test still running after 2 hours, 48 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17333[info] *** Test still running after 2 hours, 53 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17334[info] *** Test still running after 2 hours, 58 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17335[info] *** Test still running after 3 hours, 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17336[info] *** Test still running after 3 hours, 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17337[info] *** Test still running after 3 hours, 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17338[info] *** Test still running after 3 hours, 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17339[info] *** Test still running after 3 hours, 23 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17340[info] *** Test still running after 3 hours, 28 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17341[info] *** Test still running after 3 hours, 33 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17342[info] *** Test still running after 3 hours, 38 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17343[info] *** Test still running after 3 hours, 43 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17344[info] *** Test still running after 3 hours, 48 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17345[info] *** Test still running after 3 hours, 53 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17346[info] *** Test still running after 3 hours, 58 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17347[info] *** Test still running after 4 hours, 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17348[info] *** Test still running after 4 hours, 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17349[info] *** Test still running after 4 hours, 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 17350[info] *** Test still running after 4 hours, 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750).
Attachments
Issue Links
- is duplicated by
-
SPARK-34159 LocalityPlacementStrategySuite hangs in Github Actions occasionally
-
- Resolved
-
- supercedes
-
SPARK-29220 Flaky test: org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.handle large number of containers and tasks (SPARK-18750) [hadoop-3.2][java11]
-
- Closed
-
- links to