Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4072

The lt-mesos-master will coredump in some situation.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.25.0
    • 0.27.0
    • master

    Description

      I find lt-mesos-master will coredump when following conditions are met:

      (1) The user doesn't have write permission of /var/lib/mesos directory:

      nan@ubuntu:~/mesos-0.25.0/build$ ls -lt /var/lib/
      total 176
      dr-xr-xr-x 2 root root 4096 Dec 7 03:08 mesos
      ......

      (2) the /var/lib/mesos is an empty folder:
      nan@ubuntu:~/mesos-0.25.0/build$ ls -lt /var/lib/mesos/
      total 0

      Executing following command will core dump:

      nan@ubuntu:~/mesos-0.25.0/build$ ./bin/mesos-master.sh --ip=16.187.250.141 --work_dir=/var/lib/mesos
      I1207 03:18:36.431015 22951 main.cpp:229] Build: 2015-12-07 00:11:18 by nan
      I1207 03:18:36.431154 22951 main.cpp:231] Version: 0.25.0
      I1207 03:18:36.431388 22951 main.cpp:252] Using 'HierarchicalDRF' allocator
      F1207 03:18:36.431807 22951 replica.cpp:724] CHECK_SOME(state): IO error: /var/lib/mesos/replicated_log/LOCK: No such file or directory Failed to recover the log

          • Check failure stack trace: ***
            @ 0x7f076bc208ca google::LogMessage::Fail()
            @ 0x7f076bc20816 google::LogMessage::SendToLog()
            @ 0x7f076bc20218 google::LogMessage::Flush()
            @ 0x7f076bc2312c google::LogMessageFatal::~LogMessageFatal()
            @ 0x7f076adf8f30 _CheckFatal::~_CheckFatal()
            @ 0x7f076baa4939 mesos::internal::log::ReplicaProcess::restore()
            @ 0x7f076baa0f8c mesos::internal::log::ReplicaProcess::ReplicaProcess()
            @ 0x7f076baa4c95 mesos::internal::log::Replica::Replica()
            @ 0x7f076b9cf819 mesos::internal::log::LogProcess::LogProcess()
            @ 0x7f076b9d576c mesos::internal::log::Log::Log()
            @ 0x46d21f main
            @ 0x7f0766f69ec5 (unknown)
            @ 0x46b979 (unknown)
            Aborted (core dumped)

      Use gdb to analyze it:

      nan@ubuntu:~/mesos-0.25.0/build$ gdb /home/nan/mesos-0.25.0/build/src/.libs/lt-mesos-master core
      GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
      Copyright (C) 2014 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law. Type "show copying"
      and "show warranty" for details.
      This GDB was configured as "x86_64-linux-gnu".
      Type "show configuration" for configuration details.
      For bug reporting instructions, please see:
      <http://www.gnu.org/software/gdb/bugs/>.
      Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.
      For help, type "help".
      Type "apropos word" to search for commands related to "word"...
      Reading symbols from /home/nan/mesos-0.25.0/build/src/.libs/lt-mesos-master...done.
      [New LWP 22065]
      [New LWP 22087]
      [New LWP 22085]
      [New LWP 22089]
      [New LWP 22084]
      [New LWP 22086]
      [New LWP 22091]
      [New LWP 22088]
      [New LWP 22092]
      [New LWP 22090]
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      Core was generated by `/home/nan/mesos-0.25.0/build/src/.libs/lt-mesos-master --ip=127.0.0.1 --work_di'.
      Program terminated with signal SIGABRT, Aborted.
      #0 0x00007fe917810cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
      56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
      Traceback (most recent call last):
      File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", line 63, in <module>
      from libstdcxx.v6.printers import register_libstdcxx_printers
      ImportError: No module named 'libstdcxx'
      (gdb) bt
      #0 0x00007fe917810cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
      #1 0x00007fe9178140d8 in __GI_abort () at abort.c:89
      #2 0x00007fe91c4b8c1b in DumpStackTraceAndExit () from /home/nan/mesos-0.25.0/build/src/.libs/libmesos-0.25.0.so
      #3 0x00007fe91c4b28ca in google::LogMessage::Fail () from /home/nan/mesos-0.25.0/build/src/.libs/libmesos-0.25.0.so
      #4 0x00007fe91c4b2816 in google::LogMessage::SendToLog () from /home/nan/mesos-0.25.0/build/src/.libs/libmesos-0.25.0.so
      #5 0x00007fe91c4b2218 in google::LogMessage::Flush () from /home/nan/mesos-0.25.0/build/src/.libs/libmesos-0.25.0.so
      #6 0x00007fe91c4b512c in google::LogMessageFatal::~LogMessageFatal () from /home/nan/mesos-0.25.0/build/src/.libs/libmesos-0.25.0.so
      #7 0x00007fe91b68af30 in _CheckFatal::~_CheckFatal (this=0x7ffe704ec3f0, __in_chrg=<optimized out>)
      at ../../3rdparty/libprocess/3rdparty/stout/include/stout/check.hpp:165
      #8 0x00007fe91c336939 in mesos::internal::log::ReplicaProcess::restore (this=0x16f25d0, path=...) at ../../src/log/replica.cpp:724
      #9 0x00007fe91c332f8c in mesos::internal::log::ReplicaProcess::ReplicaProcess (this=0x16f25d0, path=..., __in_chrg=<optimized out>,
      __vtt_parm=<optimized out>) at ../../src/log/replica.cpp:160
      #10 0x00007fe91c336c95 in mesos::internal::log::Replica::Replica (this=0x16e82a0, path=...) at ../../src/log/replica.cpp:753
      #11 0x00007fe91c261819 in mesos::internal::log::LogProcess::LogProcess () from /home/nan/mesos-0.25.0/build/src/.libs/libmesos-0.25.0.so
      #12 0x00007fe91c26776c in mesos::internal::log::Log::Log () from /home/nan/mesos-0.25.0/build/src/.libs/libmesos-0.25.0.so
      #13 0x000000000046d21f in main (argc=3, argv=0x7ffe704ef028) at ../../src/master/main.cpp:307
      (gdb)

      Attachments

        Activity

          People

            neilc Neil Conway
            Nan Xiao Nan Xiao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: