Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1737

Isolation=external result in core dump on 0.20.0

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.20.1
    • Component/s: containerization
    • Labels:
      None

      Description

      When upgrading from 0.19.1 to 0.20.0, any slaves started with the standard deimos setup fail hard on startup. The following command spits out about 20.000 errors before core dumping:

      /etc/mesos-slave# /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --port=5051 --log_dir=/var/log/mesos --ip=172.17.8.101 --work_dir=/var/lib/mesos --isolation=external --containerizer_path=/usr/local/bin/deimos

      output:

      ....
      W0827 15:20:18.366271 721 containerizer.cpp:159] The 'external' isolation flag is deprecated, please update your flags to '--containerizers=external'.
      W0827 15:20:18.366580 721 containerizer.cpp:159] The 'external' isolation flag is deprecated, please update your flags to '--containerizers=external'.
      W0827 15:20:18.366631 721 containerizer.cpp:159] The 'external' isolation flag is deprecated, please update your flags to '--containerizers=external'.
      W0827 15:20:18.366683 721 containerizer.cpp:159] The 'external' isolation flag is deprecated, please update your flags to '--containerizers=external'.
      W0827 15:20:18.366714 721 containerizer.cpp:159] The 'external' isolation flag is deprecated, please update your flags to '--containerizers=external'.
      W0827 15:20:18.366752 721 containerizer.cpp:159] The 'external' isolation flag is deprecated, please update your flags to '--containerizers=external'.
      Segmentation fault (core dumped)

        Activity

        Hide
        benjaminhindman Benjamin Hindman added a comment -

        commit 0071a90205351b0f0e716431002e8d1d7cf5eb6f
        Author: Timothy Chen <tnachen@apache.org>
        Date: Wed Aug 27 16:33:41 2014 -0700

        Fix External Containerizer creation.

        Review: https://reviews.apache.org/r/25116

        Show
        benjaminhindman Benjamin Hindman added a comment - commit 0071a90205351b0f0e716431002e8d1d7cf5eb6f Author: Timothy Chen <tnachen@apache.org> Date: Wed Aug 27 16:33:41 2014 -0700 Fix External Containerizer creation. Review: https://reviews.apache.org/r/25116
        Hide
        tnachen Timothy Chen added a comment -

        I realize what the problem is, part of the change for the composing containerizer assumed there is a static ExternalContainerizer create but it actually doesn't, so it went infinite loop to the Containerizer create.
        Interesting to see a SEGSEV in the end though.
        Reviewboard up: https://reviews.apache.org/r/25116/

        Show
        tnachen Timothy Chen added a comment - I realize what the problem is, part of the change for the composing containerizer assumed there is a static ExternalContainerizer create but it actually doesn't, so it went infinite loop to the Containerizer create. Interesting to see a SEGSEV in the end though. Reviewboard up: https://reviews.apache.org/r/25116/
        Hide
        tnolet Tim Nolet added a comment -

        I get a whole bunch of this

         malloc.c: No such file or directory.
        (gdb) bt
        #0  _int_malloc (av=0x7ffff5956760 <main_arena>, bytes=26) at malloc.c:3302
        #1  0x00007ffff561a600 in __GI___libc_malloc (bytes=26) at malloc.c:2891
        #2  0x00007ffff60f5f2d in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
        #3  0x00007ffff61513b9 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
        #4  0x00007ffff6152ae1 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) ()
           from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
        #5  0x00007ffff6152ef8 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) ()
           from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
        #6  0x00007ffff6bbf995 in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:171
        #7  0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #8  0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #9  0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #10 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #11 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #12 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #13 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #14 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #15 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #16 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #17 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #18 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #19 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:192
        #20 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry=false) at ../../src/slave/containerizer/containerizer.cpp:
        
        Show
        tnolet Tim Nolet added a comment - I get a whole bunch of this malloc.c: No such file or directory. (gdb) bt #0 _int_malloc (av=0x7ffff5956760 <main_arena>, bytes=26) at malloc.c:3302 #1 0x00007ffff561a600 in __GI___libc_malloc (bytes=26) at malloc.c:2891 #2 0x00007ffff60f5f2d in operator new (unsigned long ) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x00007ffff61513b9 in std::string::_Rep::_S_create(unsigned long , unsigned long , std::allocator< char > const &) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x00007ffff6152ae1 in char * std::string::_S_construct< char const *>( char const *, char const *, std::allocator< char > const &, std::forward_iterator_tag) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x00007ffff6152ef8 in std::basic_string< char , std::char_traits< char >, std::allocator< char > >::basic_string( char const *, std::allocator< char > const &) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x00007ffff6bbf995 in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:171 #7 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #8 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #9 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #10 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #11 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #12 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #13 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #14 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #15 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #16 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #17 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #18 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #19 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:192 #20 0x00007ffff6bbfc4d in mesos::internal::slave::Containerizer::create (flags=..., local=local@entry= false ) at ../../src/slave/containerizer/containerizer.cpp:
        Hide
        jieyu Jie Yu added a comment -

        Can you do a 'bt' when you received the SIGSEGV?

        Show
        jieyu Jie Yu added a comment - Can you do a 'bt' when you received the SIGSEGV?
        Hide
        tnolet Tim Nolet added a comment - - edited

        Sure, no problem. It complains about a missing file or directory. This is weird. All dirs and files that are mentioned in the arguments exist. Also, the mesos slave starts fine when I just leave out the "--containerizers and --containerizer_path" arguments. See both scenarios below:

        WITH containerizer arguments: stack trace:

        Starting program: /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --port=5051 --log_dir=/var/log/mesos --ip=10.1.0.36 --work_dir=/var/lib/mesos --containerizers=external --containerizer_path=/usr/local/bin/deimos
        [Thread debugging using libthread_db enabled]
        Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
        [New Thread 0x7ffff4d56700 (LWP 206)]
        [New Thread 0x7ffff4555700 (LWP 207)]
        [New Thread 0x7ffff3d54700 (LWP 208)]
        [New Thread 0x7ffff3553700 (LWP 209)]
        [New Thread 0x7ffff2d52700 (LWP 210)]
        [New Thread 0x7ffff2551700 (LWP 211)]
        [New Thread 0x7ffff1d50700 (LWP 212)]
        [New Thread 0x7ffff154f700 (LWP 213)]
        [New Thread 0x7ffff0d4e700 (LWP 214)]
        I0827 17:01:39.182112   202 logging.cpp:142] INFO level logging started!
        I0827 17:01:39.183714   202 main.cpp:126] Build: 2014-08-22 05:05:59 by root
        I0827 17:01:39.184777   202 main.cpp:128] Version: 0.20.0
        I0827 17:01:39.185250   202 main.cpp:131] Git tag: 0.20.0
        I0827 17:01:39.185858   202 main.cpp:135] Git SHA: f421ffdf8d32a8834b3a6ee483b5b59f65956497
        
        Program received signal SIGSEGV, Segmentation fault.
        _int_malloc (av=0x7ffff5956760 <main_arena>, bytes=26) at malloc.c:3302
        3302    malloc.c: No such file or directory.
        (gdb) Quit
        

        WITHOUT containerizer arguments: all good

        (gdb) run
        Starting program: /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --port=5051 --log_dir=/var/log/mesos --ip=10.1.0.36 --work_dir=/var/lib/mesos
        [Thread debugging using libthread_db enabled]
        Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
        [New Thread 0x7ffff4d56700 (LWP 269)]
        [New Thread 0x7ffff4555700 (LWP 270)]
        [New Thread 0x7ffff3d54700 (LWP 271)]
        [New Thread 0x7ffff3553700 (LWP 272)]
        [New Thread 0x7ffff2d52700 (LWP 273)]
        [New Thread 0x7ffff2551700 (LWP 274)]
        [New Thread 0x7ffff1d50700 (LWP 275)]
        [New Thread 0x7ffff154f700 (LWP 276)]
        [New Thread 0x7ffff0d4e700 (LWP 277)]
        I0827 17:05:50.842244   265 logging.cpp:142] INFO level logging started!
        I0827 17:05:50.843018   265 main.cpp:126] Build: 2014-08-22 05:05:59 by root
        I0827 17:05:50.844144   265 main.cpp:128] Version: 0.20.0
        I0827 17:05:50.844662   265 main.cpp:131] Git tag: 0.20.0
        I0827 17:05:50.846150   265 main.cpp:135] Git SHA: f421ffdf8d32a8834b3a6ee483b5b59f65956497
        I0827 17:05:50.846617   265 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem
        I0827 17:05:50.847182   265 main.cpp:149] Starting Mesos slave
        2014-08-27 17:05:50,847:265(0x7ffff2551700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
        2014-08-27 17:05:50,848:265(0x7ffff2551700):ZOO_INFO@log_env@716: Client environment:host.name=f321e9ca8261
        2014-08-27 17:05:50,849:265(0x7ffff2551700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
        2014-08-27 17:05:50,850:265(0x7ffff2551700):ZOO_INFO@log_env@724: Client environment:os.arch=3.15.8+
        2014-08-27 17:05:50,851:265(0x7ffff2551700):ZOO_INFO@log_env@725: Client environment:os.version=#2 SMP Fri Aug 15 22:29:31 UTC 2014
        2014-08-27 17:05:50,852:265(0x7ffff2551700):ZOO_INFO@log_env@733: Client environment:user.name=(null)
        I0827 17:05:50.859122   265 slave.cpp:167] Slave started on 1)@10.1.0.36:5051
        I0827 17:05:50.864536   265 slave.cpp:278] Slave resources: cpus(*):1; mem(*):499; disk(*):11280; ports(*):[31000-32000]
        2014-08-27 17:05:50,889:265(0x7ffff2551700):ZOO_INFO@log_env@741: Client environment:user.home=/root
        2014-08-27 17:05:50,890:265(0x7ffff2551700):ZOO_INFO@log_env@753: Client environment:user.dir=/
        2014-08-27 17:05:50,890:265(0x7ffff2551700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7ffff6caea30 sessionId=0 sessionPasswd=<null> context=0x7fffec001f30 flags=0
        I0827 17:05:50.894484   265 slave.cpp:306] Slave hostname: f321e9ca8261
        I0827 17:05:50.895529   265 slave.cpp:307] Slave checkpoint: true
        I0827 17:05:50.900344   269 state.cpp:33] Recovering state from '/var/lib/mesos/meta'
        I0827 17:05:50.901289   269 state.cpp:62] Failed to find the latest slave from '/var/lib/mesos/meta'
        I0827 17:05:50.907994   272 status_update_manager.cpp:193] Recovering status update manager
        I0827 17:05:50.909273   273 containerizer.cpp:252] Recovering containerizer
        I0827 17:05:50.914098   265 slave.cpp:3195] Finished recovery
        [New Thread 0x7fffdbbe7700 (LWP 278)]
        [New Thread 0x7fffdb3e6700 (LWP 279)]
        2014-08-27 17:05:50,922:265(0x7fffdbbe7700):ZOO_INFO@check_events@1703: initiated connection to server [::1:2181]
        2014-08-27 17:05:50,925:265(0x7fffdbbe7700):ZOO_INFO@check_events@1750: session establishment complete on server [::1:2181], sessionId=0x148186e1c770004, negotiated timeout=10000
        I0827 17:05:50.926491   275 group.cpp:313] Group process (group(1)@10.1.0.36:5051) connected to ZooKeeper
        I0827 17:05:50.926971   275 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
        I0827 17:05:50.927428   275 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
        I0827 17:05:50.940911   275 detector.cpp:138] Detected a new leader: (id='0')
        I0827 17:05:50.941494   275 group.cpp:658] Trying to get '/mesos/info_0000000000' in ZooKeeper
        I0827 17:05:50.951995   275 detector.cpp:426] A new leading master (UPID=master@10.1.0.36:5050) is detected
        I0827 17:05:50.953053   275 slave.cpp:589] New master detected at master@10.1.0.36:5050
        I0827 17:05:50.954869   275 slave.cpp:625] No credentials provided. Attempting to register without authentication
        I0827 17:05:50.955654   275 slave.cpp:636] Detecting new master
        I0827 17:05:50.955529   269 status_update_manager.cpp:167] New master detected at master@10.1.0.36:5050
        I0827 17:05:51.957218   275 slave.cpp:754] Registered with master master@10.1.0.36:5050; given slave ID 20140827-155409-603980042-5050-10-0
        I0827 17:05:51.957676   275 slave.cpp:767] Checkpointing SlaveInfo to '/var/lib/mesos/meta/slaves/20140827-155409-603980042-5050-10-0/slave.info'
        ...
        
        Show
        tnolet Tim Nolet added a comment - - edited Sure, no problem. It complains about a missing file or directory. This is weird. All dirs and files that are mentioned in the arguments exist. Also, the mesos slave starts fine when I just leave out the "--containerizers and --containerizer_path" arguments. See both scenarios below: WITH containerizer arguments: stack trace: Starting program: /usr/local/sbin/mesos-slave --master=zk: //localhost:2181/mesos --port=5051 --log_dir=/ var /log/mesos --ip=10.1.0.36 --work_dir=/ var /lib/mesos --containerizers=external --containerizer_path=/usr/local/bin/deimos [ Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1" . [New Thread 0x7ffff4d56700 (LWP 206)] [New Thread 0x7ffff4555700 (LWP 207)] [New Thread 0x7ffff3d54700 (LWP 208)] [New Thread 0x7ffff3553700 (LWP 209)] [New Thread 0x7ffff2d52700 (LWP 210)] [New Thread 0x7ffff2551700 (LWP 211)] [New Thread 0x7ffff1d50700 (LWP 212)] [New Thread 0x7ffff154f700 (LWP 213)] [New Thread 0x7ffff0d4e700 (LWP 214)] I0827 17:01:39.182112 202 logging.cpp:142] INFO level logging started! I0827 17:01:39.183714 202 main.cpp:126] Build: 2014-08-22 05:05:59 by root I0827 17:01:39.184777 202 main.cpp:128] Version: 0.20.0 I0827 17:01:39.185250 202 main.cpp:131] Git tag: 0.20.0 I0827 17:01:39.185858 202 main.cpp:135] Git SHA: f421ffdf8d32a8834b3a6ee483b5b59f65956497 Program received signal SIGSEGV, Segmentation fault. _int_malloc (av=0x7ffff5956760 <main_arena>, bytes=26) at malloc.c:3302 3302 malloc.c: No such file or directory. (gdb) Quit WITHOUT containerizer arguments: all good (gdb) run Starting program: /usr/local/sbin/mesos-slave --master=zk: //localhost:2181/mesos --port=5051 --log_dir=/ var /log/mesos --ip=10.1.0.36 --work_dir=/ var /lib/mesos [ Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1" . [New Thread 0x7ffff4d56700 (LWP 269)] [New Thread 0x7ffff4555700 (LWP 270)] [New Thread 0x7ffff3d54700 (LWP 271)] [New Thread 0x7ffff3553700 (LWP 272)] [New Thread 0x7ffff2d52700 (LWP 273)] [New Thread 0x7ffff2551700 (LWP 274)] [New Thread 0x7ffff1d50700 (LWP 275)] [New Thread 0x7ffff154f700 (LWP 276)] [New Thread 0x7ffff0d4e700 (LWP 277)] I0827 17:05:50.842244 265 logging.cpp:142] INFO level logging started! I0827 17:05:50.843018 265 main.cpp:126] Build: 2014-08-22 05:05:59 by root I0827 17:05:50.844144 265 main.cpp:128] Version: 0.20.0 I0827 17:05:50.844662 265 main.cpp:131] Git tag: 0.20.0 I0827 17:05:50.846150 265 main.cpp:135] Git SHA: f421ffdf8d32a8834b3a6ee483b5b59f65956497 I0827 17:05:50.846617 265 containerizer.cpp:89] Using isolation: posix/cpu,posix/mem I0827 17:05:50.847182 265 main.cpp:149] Starting Mesos slave 2014-08-27 17:05:50,847:265(0x7ffff2551700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5 2014-08-27 17:05:50,848:265(0x7ffff2551700):ZOO_INFO@log_env@716: Client environment:host.name=f321e9ca8261 2014-08-27 17:05:50,849:265(0x7ffff2551700):ZOO_INFO@log_env@723: Client environment:os.name=Linux 2014-08-27 17:05:50,850:265(0x7ffff2551700):ZOO_INFO@log_env@724: Client environment:os.arch=3.15.8+ 2014-08-27 17:05:50,851:265(0x7ffff2551700):ZOO_INFO@log_env@725: Client environment:os.version=#2 SMP Fri Aug 15 22:29:31 UTC 2014 2014-08-27 17:05:50,852:265(0x7ffff2551700):ZOO_INFO@log_env@733: Client environment:user.name=( null ) I0827 17:05:50.859122 265 slave.cpp:167] Slave started on 1)@10.1.0.36:5051 I0827 17:05:50.864536 265 slave.cpp:278] Slave resources: cpus(*):1; mem(*):499; disk(*):11280; ports(*):[31000-32000] 2014-08-27 17:05:50,889:265(0x7ffff2551700):ZOO_INFO@log_env@741: Client environment:user.home=/root 2014-08-27 17:05:50,890:265(0x7ffff2551700):ZOO_INFO@log_env@753: Client environment:user.dir=/ 2014-08-27 17:05:50,890:265(0x7ffff2551700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7ffff6caea30 sessionId=0 sessionPasswd=< null > context=0x7fffec001f30 flags=0 I0827 17:05:50.894484 265 slave.cpp:306] Slave hostname: f321e9ca8261 I0827 17:05:50.895529 265 slave.cpp:307] Slave checkpoint: true I0827 17:05:50.900344 269 state.cpp:33] Recovering state from '/ var /lib/mesos/meta' I0827 17:05:50.901289 269 state.cpp:62] Failed to find the latest slave from '/ var /lib/mesos/meta' I0827 17:05:50.907994 272 status_update_manager.cpp:193] Recovering status update manager I0827 17:05:50.909273 273 containerizer.cpp:252] Recovering containerizer I0827 17:05:50.914098 265 slave.cpp:3195] Finished recovery [New Thread 0x7fffdbbe7700 (LWP 278)] [New Thread 0x7fffdb3e6700 (LWP 279)] 2014-08-27 17:05:50,922:265(0x7fffdbbe7700):ZOO_INFO@check_events@1703: initiated connection to server [::1:2181] 2014-08-27 17:05:50,925:265(0x7fffdbbe7700):ZOO_INFO@check_events@1750: session establishment complete on server [::1:2181], sessionId=0x148186e1c770004, negotiated timeout=10000 I0827 17:05:50.926491 275 group.cpp:313] Group process (group(1)@10.1.0.36:5051) connected to ZooKeeper I0827 17:05:50.926971 275 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0827 17:05:50.927428 275 group.cpp:385] Trying to create path '/mesos' in ZooKeeper I0827 17:05:50.940911 275 detector.cpp:138] Detected a new leader: (id='0') I0827 17:05:50.941494 275 group.cpp:658] Trying to get '/mesos/info_0000000000' in ZooKeeper I0827 17:05:50.951995 275 detector.cpp:426] A new leading master (UPID=master@10.1.0.36:5050) is detected I0827 17:05:50.953053 275 slave.cpp:589] New master detected at master@10.1.0.36:5050 I0827 17:05:50.954869 275 slave.cpp:625] No credentials provided. Attempting to register without authentication I0827 17:05:50.955654 275 slave.cpp:636] Detecting new master I0827 17:05:50.955529 269 status_update_manager.cpp:167] New master detected at master@10.1.0.36:5050 I0827 17:05:51.957218 275 slave.cpp:754] Registered with master master@10.1.0.36:5050; given slave ID 20140827-155409-603980042-5050-10-0 I0827 17:05:51.957676 275 slave.cpp:767] Checkpointing SlaveInfo to '/ var /lib/mesos/meta/slaves/20140827-155409-603980042-5050-10-0/slave.info' ...
        Hide
        jieyu Jie Yu added a comment -

        Is it possible to run the program with gdb and get the stack trace?

        Show
        jieyu Jie Yu added a comment - Is it possible to run the program with gdb and get the stack trace?
        Hide
        tnolet Tim Nolet added a comment -

        I tried that, but the core dump happens even quicker . I don't get the 20.000 error messages, just these:

        /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --port=5051 --log_dir=/var/log/mesos --ip=10.1.0.36 --work_dir=/var/lib/mesos --containerizers=external
        I0827 15:55:53.746347 100 logging.cpp:142] INFO level logging started!
        I0827 15:55:53.748201 100 main.cpp:126] Build: 2014-08-22 05:05:59 by root
        I0827 15:55:53.748769 100 main.cpp:128] Version: 0.20.0
        I0827 15:55:53.749090 100 main.cpp:131] Git tag: 0.20.0
        I0827 15:55:53.749536 100 main.cpp:135] Git SHA: f421ffdf8d32a8834b3a6ee483b5b59f65956497
        Segmentation fault (core dumped)

        Maybe my case is a bit special, as I'm running my Mesos setup from a Docker container on CoreOS. The Docker container is Ubuntu. This worked fine till now. You can check all the details here https://hub.docker.com/u/tnolet/mesos-on-coreos/

        cat /etc/os-release
        NAME="Ubuntu"
        VERSION="14.04.1 LTS, Trusty Tahr"
        ID=ubuntu
        ID_LIKE=debian
        PRETTY_NAME="Ubuntu 14.04.1 LTS"
        VERSION_ID="14.04"
        HOME_URL="http://www.ubuntu.com/"
        SUPPORT_URL="http://help.ubuntu.com/"
        BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"

        Show
        tnolet Tim Nolet added a comment - I tried that, but the core dump happens even quicker . I don't get the 20.000 error messages, just these: /usr/local/sbin/mesos-slave --master=zk://localhost:2181/mesos --port=5051 --log_dir=/var/log/mesos --ip=10.1.0.36 --work_dir=/var/lib/mesos --containerizers=external I0827 15:55:53.746347 100 logging.cpp:142] INFO level logging started! I0827 15:55:53.748201 100 main.cpp:126] Build: 2014-08-22 05:05:59 by root I0827 15:55:53.748769 100 main.cpp:128] Version: 0.20.0 I0827 15:55:53.749090 100 main.cpp:131] Git tag: 0.20.0 I0827 15:55:53.749536 100 main.cpp:135] Git SHA: f421ffdf8d32a8834b3a6ee483b5b59f65956497 Segmentation fault (core dumped) Maybe my case is a bit special, as I'm running my Mesos setup from a Docker container on CoreOS. The Docker container is Ubuntu. This worked fine till now. You can check all the details here https://hub.docker.com/u/tnolet/mesos-on-coreos/ cat /etc/os-release NAME="Ubuntu" VERSION="14.04.1 LTS, Trusty Tahr" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 14.04.1 LTS" VERSION_ID="14.04" HOME_URL="http://www.ubuntu.com/" SUPPORT_URL="http://help.ubuntu.com/" BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
        Hide
        benjaminhindman Benjamin Hindman added a comment -

        Sorry to hear about this Tim Nolet . It's not ideal, but can you swap your flags to --containerizers=external for now? Timothy Chen, please take a look, let's get this fixed quickly and part of a 0.20.1.

        Show
        benjaminhindman Benjamin Hindman added a comment - Sorry to hear about this Tim Nolet . It's not ideal, but can you swap your flags to --containerizers=external for now? Timothy Chen , please take a look, let's get this fixed quickly and part of a 0.20.1.

          People

          • Assignee:
            tnachen Timothy Chen
            Reporter:
            tnolet Tim Nolet
            Shepherd:
            Benjamin Hindman
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development