[MESOS-5482] mesos/marathon task stuck in staging after slave reboot - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
- tech-debt

Description

The main idea of mesos/marathon is to sleep well, but after node reboot mesos task gets stuck in staging for about 4 hours.

To reproduce the issue:

setup a mesos cluster in HA mode with systemd enabled mesos-master and mesos-slave service.
run docker registry (https://hub.docker.com/_/registry/ ) with mesos constraint (hostname:LIKE:mesos-slave-1) in one node. Reboot the node and notice that task getting stuck in staging.

Possible workaround: service mesos-slave restart fixes the issue.
OS: centos 7.2
mesos version: 0.28.1
marathon: 1.1.1
zookeeper: 3.4.8
docker: 1.9.1 dockerAPIversion: 1.21

error message:
May 30 08:38:24 euca-10-254-237-140 mesos-slave[832]: W0530 08:38:24.120013 909 slave.cpp:2018] Ignoring kill task docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3 because the executor 'docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3' of framework 8517fcb7-f2d0-47ad-ae02-837570bef929-0000 is terminating/terminated

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

marathon-mesos-masters_after-reboot.log
30/May/16 08:23
22 kB
lutful karim
mesos_slaves_after_reboot.log
30/May/16 08:23
215 kB
lutful karim
mesos-masters_mesos.log
30/May/16 08:23
252 kB
lutful karim
tasks_running_before_rebooot.marathon
30/May/16 08:23
6 kB
lutful karim

Issue Links

duplicates

MESOS-7215 Race condition on re-registration of non-partition-aware frameworks

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: lutful karim

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 29/May/16 10:52

Updated:: 11/Sep/17 23:33

Resolved:: 07/Sep/17 00:11