Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-9875

Mesos did not respond correctly when operations should fail

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.9.0
    • agent

    Description

      For testing persistent volumes with OPERATION_FAILED/ERROR feedbacks, we sshed into the mesos-agent and made it unable to create subdirectories in /srv/mesos/work/volumes, however, mesos did not respond any operation failed response. Instead, we received OPERATION_FINISHED feedback.

      Steps to recreate the issue:

      1. Ssh into a magent.
      2. Make it impossible to create a persistent volume (we expect the agent to crash and reregister, and the master to release that the operation is OPERATION_DROPPED):

      • cd /srv/mesos/work (if it doesn't exist mkdir /srv/mesos/work/volumes)
      • chattr -RV +i volumes (then no subdirectories can be created)

      3. Launch a service with persistent volumes with the constraint of only using the magent modified above.

       

       

      Logs for the scheduler for receiving `OPERATION_FINISHED`:

      (Also see screenshot)

       

      2019-06-27 21:57:11.879 [12768651|rdar://12768651] [Jarvis-mesos-dispatcher-105] INFO c.a.j.s.ServicePodInstance - Stored operation=4g3k02s1gjb0q_5f912b59-a32d-462c-9c46-8401eba4d2c1 and feedback=OPERATION_FINISHED in podInstanceID=4g3k02s1gjb0q on serviceID=yifan-badagents-1

       

      • 2019-06-27 21:55:23: task reached state TASK_FAILED for mesos reason: REASON_CONTAINER_LAUNCH_FAILED with mesos message: Failed to launch container: Failed to change the ownership of the persistent volume at '/srv/mesos/work/volumes/roles/test-2/19b564e8-3a90-4f2f-981d-b3dd2a5d9f90' with uid 264 and gid 264: No such file or directory

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            greggomann Greg Mann
            yifan_xing Yifan Xing
            James Peach James Peach
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprints:
                Foundations: RI-17 Sprint 52 ended 14/Aug/19
                Foundations: RI-17 Sprint 53 ended 28/Aug/19
                View on Board

                Slack

                  Issue deployment