Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-8613

Test `MasterAllocatorTest/*.TaskFinished` is flaky.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.0
    • 1.7.0
    • None
    • Mesosphere Sprint 76
    • 2

    Description

      Occasionally the test would crash with the following logs:

      I0225 06:34:21.732908  1835 slave.cpp:3467] Launching container 46824279-01b3-4dcb-9be0-696cdacefe2f for executor 'default' of framework f1caa559-f62f-40f6-9786-100401bc9062-0000
      I0225 06:34:21.733026  1835 slave.cpp:5705] Forwarding the update TASK_LOST (Status UUID: 29c6ae17-0e46-4dea-be36-e9e64e1d95c5) for task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 to master@172.16.10.21:58470
      I0225 06:34:21.733083  1835 slave.cpp:5598] Task status update manager successfully handled status update TASK_LOST (Status UUID: 29c6ae17-0e46-4dea-be36-e9e64e1d95c5) for task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000
      I0225 06:34:21.733202  1835 master.cpp:7894] Status update TASK_LOST (Status UUID: 29c6ae17-0e46-4dea-be36-e9e64e1d95c5) for task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 from agent f1caa559-f62f-40f6-9786-100401bc9062-S0 at sla
      ve(128)@172.16.10.21:58470 (ip-172-16-10-21.ec2.internal)
      I0225 06:34:21.733232  1835 master.cpp:7950] Forwarding status update TASK_LOST (Status UUID: 29c6ae17-0e46-4dea-be36-e9e64e1d95c5) for task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000
      I0225 06:34:21.733273  1835 master.cpp:10258] Updating the state of task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 (latest state: TASK_LOST, status update state: TASK_LOST)
      *** Aborted at 1519540461 (unix time) try "date -d @1519540461" if you are using GNU date ***
      I0225 06:34:21.733470  1838 sched.cpp:1027] Scheduler::statusUpdate took 8126ns
      I0225 06:34:21.733700  1838 hierarchical.cpp:1192] Recovered cpus(allocated: *):1; mem(allocated: *):256 (total: cpus:3; mem:1024; disk:35056; ports:[31000-32000], allocated: cpus(allocated: *):1; mem(allocated: *):256) on agent f1caa559-f62f-4
      0f6-9786-100401bc9062-S0 from framework f1caa559-f62f-40f6-9786-100401bc9062-0000
      WPC: @               0x20 (unknown)
      0225 06:34:21.734422  1836 process.cpp:2805] Attempted to spawn already running process version@172.16.10.21:58470
      I0225 06:34:21.734592  1836 exec.cpp:162] Version: 1.6.0
      I0225 06:34:21.734761  1837 exec.cpp:212] Executor started at: executor(32)@172.16.10.21:58470 with pid 1814
      I0225 06:34:21.734835  1837 slave.cpp:4747] Got registration for executor 'default' of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 from executor(32)@172.16.10.21:58470
      I0225 06:34:21.734966  1840 exec.cpp:236] Executor registered on agent f1caa559-f62f-40f6-9786-100401bc9062-S0
      I0225 06:34:21.734990  1840 exec.cpp:248] Executor::registered took 9639ns
      I0225 06:34:21.740001  1836 slave.cpp:3199] Sending queued task '0' to executor 'default' of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 at executor(32)@172.16.10.21:58470
      I0225 06:34:21.740236  1840 exec.cpp:330] Executor asked to run task '0'
      I0225 06:34:21.740284  1840 exec.cpp:339] Executor::launchTask took 28783ns
      I0225 06:34:21.740320  1840 exec.cpp:581] Executor sending status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000
      I0225 06:34:21.740419  1840 slave.cpp:5213] Handling status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 from executor(32)@172.16.10.21:58470
      I0225 06:34:21.740563  1840 task_status_update_manager.cpp:328] Received task status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000
      I0225 06:34:21.740583  1840 task_status_update_manager.cpp:507] Creating StatusUpdate stream for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000
      I0225 06:34:21.740676  1840 task_status_update_manager.cpp:383] Forwarding task status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 to the agent
      I0225 06:34:21.740733  1840 slave.cpp:5705] Forwarding the update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 to master@172.16.10.21:58470
      I0225 06:34:21.740799  1840 slave.cpp:5598] Task status update manager successfully handled status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000
      I0225 06:34:21.740828  1840 slave.cpp:5614] Sending acknowledgement for status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 to executor(32)@172.16.10.21:58470
      I0225 06:34:21.740886  1840 exec.cpp:398] Executor received status update acknowledgement 7cf0069b-a6dc-4d7e-9e16-7ade7a451334 for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000
      

      This has been observed multiple times and every time the test crashed right after

      I0225 06:34:21.733273  1835 master.cpp:10258] Updating the state of task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 (latest state: TASK_LOST, status update state: TASK_LOST)
      

      Attached logs of 4 crash instances.

      Attachments

        1. consoleText.1.log
          31 kB
          Chun-Hung Hsiao
        2. consoleText.2.log
          31 kB
          Chun-Hung Hsiao
        3. consoleText.3.log
          27 kB
          Chun-Hung Hsiao
        4. consoleText.4.log
          30 kB
          Chun-Hung Hsiao

        Issue Links

          Activity

            People

              tillt Till Toenshoff
              chhsia0 Chun-Hung Hsiao
              Alex R Alex R
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: