Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4025

SlaveRecoveryTest/0.GCExecutor is flaky.

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.26.0
    • None
    • test
    • Mesosphere Sprint 23, Mesosphere Sprint 26

    Description

      Build was SSL enabled (--enable-ssl, --enable-libevent). The build was based on 0.26.0-rc1.

      Testsuite was run as root.

      sudo ./bin/mesos-tests.sh --gtest_break_on_failure --gtest_repeat=-1
      
      [ RUN      ] SlaveRecoveryTest/0.GCExecutor
      I1130 16:49:16.336833  1032 exec.cpp:136] Version: 0.26.0
      I1130 16:49:16.345212  1049 exec.cpp:210] Executor registered on slave dde9fd4e-b016-4a99-9081-b047e9df9afa-S0
      Registered executor on ubuntu14
      Starting task 22c63bba-cbf8-46fd-b23a-5409d69e4114
      sh -c 'sleep 1000'
      Forked command at 1057
      ../../src/tests/mesos.cpp:779: Failure
      (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup '/sys/fs/cgroup/memory/mesos_test_e5edb2a8-9af3-441f-b991-613082f264e2/slave': Device or resource busy
      *** Aborted at 1448902156 (unix time) try "date -d @1448902156" if you are using GNU date ***
      PC: @          0x1443e9a testing::UnitTest::AddTestPartResult()
      *** SIGSEGV (@0x0) received by PID 27364 (TID 0x7f1bfdd2b800) from PID 0; stack trace: ***
          @     0x7f1be92b80b7 os::Linux::chained_handler()
          @     0x7f1be92bc219 JVM_handle_linux_signal
          @     0x7f1bf7bbc340 (unknown)
          @          0x1443e9a testing::UnitTest::AddTestPartResult()
          @          0x1438b99 testing::internal::AssertHelper::operator=()
          @           0xf0b3bb mesos::internal::tests::ContainerizerTest<>::TearDown()
          @          0x1461882 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
          @          0x145c6f8 testing::internal::HandleExceptionsInMethodIfSupported<>()
          @          0x143de4a testing::Test::Run()
          @          0x143e584 testing::TestInfo::Run()
          @          0x143ebca testing::TestCase::Run()
          @          0x1445312 testing::internal::UnitTestImpl::RunAllTests()
          @          0x14624a7 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
          @          0x145d26e testing::internal::HandleExceptionsInMethodIfSupported<>()
          @          0x14440ae testing::UnitTest::Run()
          @           0xd15cd4 RUN_ALL_TESTS()
          @           0xd158c1 main
          @     0x7f1bf7808ec5 (unknown)
          @           0x913009 (unknown)
      

      My Vagrantfile generator;

      #!/usr/bin/env bash
      
      cat << EOF > Vagrantfile
      # -*- mode: ruby -*-" >
      # vi: set ft=ruby :
      Vagrant.configure(2) do |config|
        # Disable shared folder to prevent certain kernel module dependencies.
        config.vm.synced_folder ".", "/vagrant", disabled: true
      
        config.vm.box = "bento/ubuntu-14.04"
      
        config.vm.hostname = "${PLATFORM_NAME}"
      
        config.vm.provider "virtualbox" do |vb|
          vb.memory = ${VAGRANT_MEM}
          vb.cpus = ${VAGRANT_CPUS}
          vb.customize ["modifyvm", :id, "--nictype1", "virtio"]
          vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
          vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
        end
      
        config.vm.provider "vmware_fusion" do |vb|
          vb.memory = ${VAGRANT_MEM}
          vb.cpus = ${VAGRANT_CPUS}
        end
      
        config.vm.provision "file", source: "../test.sh", destination: "~/test.sh"
      
        config.vm.provision "shell", inline: <<-SHELL
          sudo apt-get update
          sudo apt-get -y install openjdk-7-jdk autoconf libtool
          sudo apt-get -y install build-essential python-dev python-boto          \
                                  libcurl4-nss-dev libsasl2-dev maven             \
                                  libapr1-dev libsvn-dev libssl-dev libevent-dev
          sudo apt-get -y install git
          sudo wget -qO- https://get.docker.com/ | sh
        SHELL
      end
      EOF
      

      The problem is kicking in frequently in my tests - I'ld say > 10% but less than 50%.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nfnt Jan Schlicht
            tillt Till Toenshoff
            Till Toenshoff Till Toenshoff
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprints:
                Mesosphere Sprint 23 ended 07/Dec/15
                Mesosphere Sprint 26 ended 21/Jan/16
                View on Board

                Slack

                  Issue deployment