Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7466

Mesos marathon and docker not synchronized

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      I submitted an application group in marathon, and then I deleted the application group because there was some problem with the definition of the application group, and then I submitted the application group again via the marathon api. After multiple deletions and multiple submissions, I found some problems (this is the third time I encountered the problem), I deleted the application group in the marathon UI, found in the `deployments` there are some content, and can not be deleted, suggesting that `error destroying null: app '/ Null' does not exist `; in the mesos ui task status is `running`; docker container status is `up`. Then, I restart mesos-master, mesos-slave, marathon and zookeeper, the result is the same.

      I suggested that I clear the marathon in the zookeeper and then I did; marathon was back to normal, just like reinstalling it. But the tasks I want to delete are still in mesos ui and the state is `running`.

      After that, I cleared the meso in zookeeper, and the result is the same. The task I want to delete is still in mesos ui. But I have to delete the task in the docker does not exist, I do not know why.

      The follow is my config info:

      $ curl -sSL http://172.30.30.4:5050/version | python -m json.tool
      {
          "build_date": "2017-04-12 16:39:09",
          "build_time": 1492015149.0,
          "build_user": "centos",
          "git_sha": "de306b5786de3c221bae1457c6f2ccaeb38eef9f",
          "git_tag": "1.2.0",
          "version": "1.2.0"
      }
      
      $ curl -sSL http://172.30.30.4:8080/v2/info | python -m json.tool
      {
          ......
          "name": "marathon",
          "version": "1.4.2",
          ......
      }
      
      $ docker version
      Client:
       Version:      1.12.5
       API version:  1.24
       Go version:   go1.6.4
       Git commit:   7392c3b
       Built:        Fri Dec 16 02:23:59 2016
       OS/Arch:      linux/amd64
      
      Server:
       Version:      1.12.5
       API version:  1.24
       Go version:   go1.6.4
       Git commit:   7392c3b
       Built:        Fri Dec 16 02:23:59 2016
       OS/Arch:      linux/amd64
      
      $ docker info
      Containers: 2
       Running: 1
       Paused: 0
       Stopped: 1
      Images: 16
      Server Version: 1.12.5
      Storage Driver: devicemapper
       Pool Name: docker-253:0-403269431-pool
       Pool Blocksize: 65.54 kB
       Base Device Size: 10.74 GB
       Backing Filesystem: xfs
       Data file: /dev/loop0
       Metadata file: /dev/loop1
       Data Space Used: 4.915 GB
       Data Space Total: 107.4 GB
       Data Space Available: 94.26 GB
       Metadata Space Used: 7.115 MB
       Metadata Space Total: 2.147 GB
       Metadata Space Available: 2.14 GB
       Thin Pool Minimum Free Space: 10.74 GB
       Udev Sync Supported: true
       Deferred Removal Enabled: false
       Deferred Deletion Enabled: false
       Deferred Deleted Device Count: 0
       Data loop file: /var/lib/docker/devicemapper/devicemapper/data
       WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
       Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
       Library Version: 1.02.135-RHEL7 (2016-09-28)
      Logging Driver: json-file
      Cgroup Driver: cgroupfs
      Plugins:
       Volume: local
       Network: calico null bridge host overlay
      Swarm: inactive
      Runtimes: runc
      Default Runtime: runc
      Security Options: seccomp
      Kernel Version: 3.10.0-514.2.2.el7.x86_64
      Operating System: CentOS Linux 7 (Core)
      OSType: linux
      Architecture: x86_64
      CPUs: 8
      Total Memory: 15.51 GiB
      Name: slave1
      ID: 4VRS:RPSC:ASAX:MTRJ:TOGC:6RFL:6RJS:J4MK:3VSN:JZO2:Q2FB:LJEW
      Docker Root Dir: /var/lib/docker
      Debug Mode (client): false
      Debug Mode (server): false
      Registry: https://index.docker.io/v1/
      Cluster Store: etcd://172.30.30.5:2379
      Cluster Advertise: 172.30.30.5:2375
      Insecure Registries:
       172.30.30.8:80
       127.0.0.0/8
      
      $ /usr/sbin/mesos-master \
          --zk=zk://172.30.30.4:2181,172.30.30.12:2181/mesos \
          --port=5050 \
          --log_dir=/var/log/mesos \
          --cluster=cp_cluster \
          --hostname=172.30.30.4 \
          --quorum=1 \
          --work_dir=/var/lib/mesos
      
      $ /usr/sbin/mesos-slave \
          --master=zk://172.30.30.4:2181,172.30.30.12:2181/mesos \
          --log_dir=/var/log/mesos \
          --containerizers=docker,mesos \
          --executor_registration_timeout=5mins \
          --hostname=172.30.30.5 \
          --ip=172.30.30.5 \
          --isolation=filesystem/linux,docker/runtime,network/cni \
          --network_cni_config_dir=/var/lib/mesos/cni/config \
          --network_cni_plugins_dir=/var/lib/mesos/cni/plugins \
          --work_dir=/var/lib/mesos
      
      $ curl -sSL http://172.30.30.12:5050/master/tasks | python -m json.tool | grep "\"id\""
                  "id": "monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef",
                  "id": "monitor-tools_prometheus.c6bdfccc-3233-11e7-bd32-024222b481ef",
                  "id": "monitor-tools_alertmanager.c3bfff0b-3233-11e7-bd32-024222b481ef",
                  "id": "monitor-exporter_node-exporter.0e44c1c5-3232-11e7-bd32-024222b481ef",
                  "id": "monitor-exporter_cadvisor.0e455e07-3232-11e7-bd32-024222b481ef",
                  "id": "monitor-exporter_mesos-exporter.0e43fe73-3232-11e7-bd32-024222b481ef",
                  "id": "syslog.e0645b43-3238-11e7-90c2-02423620ccdc",
                  "id": "syslog.e72bbd15-3238-11e7-90c2-02423620ccdc",
                  "id": "syslog.a0b162ff-316f-11e7-bd32-024222b481ef",
                  "id": "mesos-dns.1286663d-30ae-11e7-bd32-024222b481ef",
                  "id": "mesos-dns.12c20fae-30ae-11e7-bd32-024222b481ef",
                  "id": "mesos-dns.1286181c-30ae-11e7-bd32-024222b481ef",
      
      # slave1
      $ docker ps -a
      CONTAINER ID        IMAGE                        COMMAND             CREATED             STATUS              PORTS               NAMES
      e05468eaedc2        quay.io/calico/node:v1.1.0   "start_runit"       2 weeks ago         Up 4 hours                              calico-node
      
      # slave2
      $ docker ps -a
      CONTAINER ID        IMAGE                        COMMAND             CREATED             STATUS              PORTS               NAMES
      7fcbb3f2fa61        quay.io/calico/node:v1.1.0   "start_runit"       2 weeks ago         Up 4 hours                              calico-node
      
      # slave3
      $ docker ps -a
      CONTAINER ID        IMAGE                        COMMAND             CREATED             STATUS              PORTS               NAMES
      c69eba5bc322        quay.io/calico/node:v1.1.0   "start_runit"       2 weeks ago         Up 4 hours                              calico-node
      

      `monitor-***` tasks are I want to delete, but these tasks do not exist in the docker (`docker ps -a` is not the task).

      The follow is `monitor-***`tasks log info(`stdout`):

      ......
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Received killTask for task monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef
      Re-registered docker executor on 172.30.30.6
      Re-registered docker executor on 172.30.30.6
      

      Now it's `stderr`:

      I0506 23:26:45.211597 17482 exec.cpp:162] Version: 1.2.0
      I0506 23:26:45.224807 17489 exec.cpp:237] Executor registered on agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      I0506 23:26:45.310137 17486 docker.cpp:850] Running docker -H unix:///var/run/docker.sock run --cpu-shares 512 --memory 1073741824 --env-file /tmp/qT2brz -v /etc/localtime:/etc/localtime:ro -v /var/lib/mesos/slaves/e79deb05-5f20-48ca-9de9-a9610504e040-S1/frameworks/1e68ea0f-0f0b-4f14-8e2e-ab10169ee5f3-0000/executors/monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef/runs/7674214b-a609-47e6-a51a-7129616e8494:/mnt/mesos/sandbox --net calico1 --log-driver=syslog --log-opt=syslog-address=tcp://172.30.30.11:514 --log-opt=tag=grafana --label=MESOS_TASK_ID=monitor-tools_grafana.ea8672ed-3233-11e7-bd32-024222b481ef --name mesos-e79deb05-5f20-48ca-9de9-a9610504e040-S1.7674214b-a609-47e6-a51a-7129616e8494 grafana/grafana
      I0506 16:57:13.687083 17489 exec.cpp:488] Agent exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      I0506 16:57:14.359922 17486 exec.cpp:283] Received reconnect request from agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      I0506 16:57:14.366843 17488 exec.cpp:260] Executor re-registered on agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      I0506 17:45:40.419311 17485 exec.cpp:488] Agent exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      I0506 17:45:41.629977 17485 exec.cpp:283] Received reconnect request from agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      I0506 17:45:41.658910 17485 exec.cpp:260] Executor re-registered on agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      I0506 18:02:20.707939 17485 exec.cpp:488] Agent exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      I0506 18:03:43.669370 17485 exec.cpp:283] Received reconnect request from agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      I0506 18:03:43.698617 17485 exec.cpp:260] Executor re-registered on agent e79deb05-5f20-48ca-9de9-a9610504e040-S1
      

      Now i want to know how to remove these tasks, and where are these tasks running? (I do not think run in the mesos container because I specified `"type": "DOCKER"`)

      There is a slave also thrown the following exception:

      Message from syslogd@slave2 at May  6 08:27:35 ...
       kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      Message from syslogd@slave2 at May  6 08:27:45 ...
       kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jasontom147 jasontom
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: