Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34154

Flaky Test: LocalityPlacementStrategySuite.handle large number of containers and tasks (SPARK-18750)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.2, 3.1.1, 3.2.0
    • 3.0.2, 3.1.1, 3.2.0
    • YARN
    • None

    Description

      `LocalityPlacementStrategySuite` hangs sometimes like the following. We can retriever, but it takes our resource significantly because it hangs until the timeout (6 hours) occurs.

      https://github.com/apache/spark/runs/1719480243

      https://github.com/apache/spark/runs/1724459002

      https://github.com/apache/spark/runs/1717958874

      https://github.com/apache/spark/runs/1731673955 (branch-3.0)

      [info] LocalityPlacementStrategySuite:
      17299[info] *** Test still running after 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17300[info] *** Test still running after 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17301[info] *** Test still running after 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17302[info] *** Test still running after 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17303[info] *** Test still running after 23 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17304[info] *** Test still running after 28 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17305[info] *** Test still running after 33 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17306[info] *** Test still running after 38 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17307[info] *** Test still running after 43 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17308[info] *** Test still running after 48 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17309[info] *** Test still running after 53 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17310[info] *** Test still running after 58 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17311[info] *** Test still running after 1 hour, 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17312[info] *** Test still running after 1 hour, 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17313[info] *** Test still running after 1 hour, 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17314[info] *** Test still running after 1 hour, 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17315[info] *** Test still running after 1 hour, 23 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17316[info] *** Test still running after 1 hour, 28 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17317[info] *** Test still running after 1 hour, 33 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17318[info] *** Test still running after 1 hour, 38 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17319[info] *** Test still running after 1 hour, 43 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17320[info] *** Test still running after 1 hour, 48 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17321[info] *** Test still running after 1 hour, 53 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17322[info] *** Test still running after 1 hour, 58 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17323[info] *** Test still running after 2 hours, 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17324[info] *** Test still running after 2 hours, 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17325[info] *** Test still running after 2 hours, 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17326[info] *** Test still running after 2 hours, 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17327[info] *** Test still running after 2 hours, 23 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17328[info] *** Test still running after 2 hours, 28 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17329[info] *** Test still running after 2 hours, 33 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17330[info] *** Test still running after 2 hours, 38 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17331[info] *** Test still running after 2 hours, 43 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17332[info] *** Test still running after 2 hours, 48 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17333[info] *** Test still running after 2 hours, 53 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17334[info] *** Test still running after 2 hours, 58 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17335[info] *** Test still running after 3 hours, 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17336[info] *** Test still running after 3 hours, 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17337[info] *** Test still running after 3 hours, 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17338[info] *** Test still running after 3 hours, 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17339[info] *** Test still running after 3 hours, 23 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17340[info] *** Test still running after 3 hours, 28 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17341[info] *** Test still running after 3 hours, 33 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17342[info] *** Test still running after 3 hours, 38 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17343[info] *** Test still running after 3 hours, 43 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17344[info] *** Test still running after 3 hours, 48 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17345[info] *** Test still running after 3 hours, 53 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17346[info] *** Test still running after 3 hours, 58 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17347[info] *** Test still running after 4 hours, 3 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17348[info] *** Test still running after 4 hours, 8 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17349[info] *** Test still running after 4 hours, 13 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750). 
      17350[info] *** Test still running after 4 hours, 18 minutes, 6 seconds: suite name: LocalityPlacementStrategySuite, test name: handle large number of containers and tasks (SPARK-18750).  

      Attachments

        Issue Links

          Activity

            People

              attilapiros Attila Zsolt Piros
              dongjoon Dongjoon Hyun
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: