Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1220

Make check failure on OSX - IO error: Too many open files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.19.0
    • None
    • None
    • OSX 10.9.2, Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)

    Description

      Make check runs into an abort:

      $ make check
      [...]
      [       OK ] CoordinatorTest.LearnedOnOneReplica_NotLearnedOnAnother_AnotherFailsAndRecovers (0 ms)
      [----------] 21 tests from CoordinatorTest (816 ms total)
      
      [----------] 2 tests from RecoverTest
      [ RUN      ] RecoverTest.RacingCatchup
      F0417 21:45:21.254204 1980908304 replica.cpp:709] CHECK_SOME(state): IO error: /private/tmp/RecoverTest_RacingCatchup_if5Cz6/.log4/LOCK: Too many open filesFailed to recover the log
      *** Check failure stack trace: ***
          @        0x10a2f9434  google::LogMessage::SendToLog()
          @        0x10a2f9963  google::LogMessage::Flush()
          @        0x10a2fcaff  google::LogMessageFatal::~LogMessageFatal()
          @        0x10a2fa059  google::LogMessageFatal::~LogMessageFatal()
          @        0x109dd8479  _CheckFatal::~_CheckFatal()
          @        0x109dd8349  _CheckFatal::~_CheckFatal()
          @        0x10a1b379a  mesos::internal::log::ReplicaProcess::restore()
          @        0x10a1b3241  mesos::internal::log::ReplicaProcess::ReplicaProcess()
          @        0x10a1b696b  mesos::internal::log::Replica::Replica()
          @        0x1091ebd9a  RecoverTest_RacingCatchup_Test::TestBody()
          @        0x10945234c  testing::internal::HandleExceptionsInMethodIfSupported<>()
          @        0x1094431ea  testing::Test::Run()
          @        0x109443e72  testing::TestInfo::Run()
          @        0x1094444b0  testing::TestCase::Run()
          @        0x109449d05  testing::internal::UnitTestImpl::RunAllTests()
          @        0x109452b14  testing::internal::HandleExceptionsInMethodIfSupported<>()
          @        0x109449a39  testing::UnitTest::Run()
          @        0x10922a270  main
          @     0x7fff8a98d5fd  start
          @                0x1  (unknown)
      make[3]: *** [check-local] Abort trap: 6
      

      That test does not fail when being run individually, hinting that we got some general file-handle leakage problem.

      The exact test that throws the abort bomb is machine and dependent. Tried it on two MBP's and one fails a few tests earlier than the other.

      Attachments

        Activity

          People

            bmahler Benjamin Mahler
            tillt Till Toenshoff
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: