Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.5.0
-
None
-
Mesosphere Sprint 76
-
2
Description
Occasionally the test would crash with the following logs:
I0225 06:34:21.732908 1835 slave.cpp:3467] Launching container 46824279-01b3-4dcb-9be0-696cdacefe2f for executor 'default' of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 I0225 06:34:21.733026 1835 slave.cpp:5705] Forwarding the update TASK_LOST (Status UUID: 29c6ae17-0e46-4dea-be36-e9e64e1d95c5) for task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 to master@172.16.10.21:58470 I0225 06:34:21.733083 1835 slave.cpp:5598] Task status update manager successfully handled status update TASK_LOST (Status UUID: 29c6ae17-0e46-4dea-be36-e9e64e1d95c5) for task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 I0225 06:34:21.733202 1835 master.cpp:7894] Status update TASK_LOST (Status UUID: 29c6ae17-0e46-4dea-be36-e9e64e1d95c5) for task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 from agent f1caa559-f62f-40f6-9786-100401bc9062-S0 at sla ve(128)@172.16.10.21:58470 (ip-172-16-10-21.ec2.internal) I0225 06:34:21.733232 1835 master.cpp:7950] Forwarding status update TASK_LOST (Status UUID: 29c6ae17-0e46-4dea-be36-e9e64e1d95c5) for task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 I0225 06:34:21.733273 1835 master.cpp:10258] Updating the state of task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 (latest state: TASK_LOST, status update state: TASK_LOST) *** Aborted at 1519540461 (unix time) try "date -d @1519540461" if you are using GNU date *** I0225 06:34:21.733470 1838 sched.cpp:1027] Scheduler::statusUpdate took 8126ns I0225 06:34:21.733700 1838 hierarchical.cpp:1192] Recovered cpus(allocated: *):1; mem(allocated: *):256 (total: cpus:3; mem:1024; disk:35056; ports:[31000-32000], allocated: cpus(allocated: *):1; mem(allocated: *):256) on agent f1caa559-f62f-4 0f6-9786-100401bc9062-S0 from framework f1caa559-f62f-40f6-9786-100401bc9062-0000 WPC: @ 0x20 (unknown) 0225 06:34:21.734422 1836 process.cpp:2805] Attempted to spawn already running process version@172.16.10.21:58470 I0225 06:34:21.734592 1836 exec.cpp:162] Version: 1.6.0 I0225 06:34:21.734761 1837 exec.cpp:212] Executor started at: executor(32)@172.16.10.21:58470 with pid 1814 I0225 06:34:21.734835 1837 slave.cpp:4747] Got registration for executor 'default' of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 from executor(32)@172.16.10.21:58470 I0225 06:34:21.734966 1840 exec.cpp:236] Executor registered on agent f1caa559-f62f-40f6-9786-100401bc9062-S0 I0225 06:34:21.734990 1840 exec.cpp:248] Executor::registered took 9639ns I0225 06:34:21.740001 1836 slave.cpp:3199] Sending queued task '0' to executor 'default' of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 at executor(32)@172.16.10.21:58470 I0225 06:34:21.740236 1840 exec.cpp:330] Executor asked to run task '0' I0225 06:34:21.740284 1840 exec.cpp:339] Executor::launchTask took 28783ns I0225 06:34:21.740320 1840 exec.cpp:581] Executor sending status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 I0225 06:34:21.740419 1840 slave.cpp:5213] Handling status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 from executor(32)@172.16.10.21:58470 I0225 06:34:21.740563 1840 task_status_update_manager.cpp:328] Received task status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 I0225 06:34:21.740583 1840 task_status_update_manager.cpp:507] Creating StatusUpdate stream for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 I0225 06:34:21.740676 1840 task_status_update_manager.cpp:383] Forwarding task status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 to the agent I0225 06:34:21.740733 1840 slave.cpp:5705] Forwarding the update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 to master@172.16.10.21:58470 I0225 06:34:21.740799 1840 slave.cpp:5598] Task status update manager successfully handled status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 I0225 06:34:21.740828 1840 slave.cpp:5614] Sending acknowledgement for status update TASK_RUNNING (Status UUID: 7cf0069b-a6dc-4d7e-9e16-7ade7a451334) for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 to executor(32)@172.16.10.21:58470 I0225 06:34:21.740886 1840 exec.cpp:398] Executor received status update acknowledgement 7cf0069b-a6dc-4d7e-9e16-7ade7a451334 for task 0 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000
This has been observed multiple times and every time the test crashed right after
I0225 06:34:21.733273 1835 master.cpp:10258] Updating the state of task 1 of framework f1caa559-f62f-40f6-9786-100401bc9062-0000 (latest state: TASK_LOST, status update state: TASK_LOST)
Attached logs of 4 crash instances.
Attachments
Attachments
Issue Links
- is caused by
-
MESOS-8463 Test MasterAllocatorTest/1.SingleFramework is flaky
- Resolved