Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-1404

Glibc 'fork()' is not async signal safe

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.20.0
    • None
    • None
    • Q2'14 Sprint 2, Q2 Sprint 3, Q2 Sprint 4

    Description

      This is due to 'fork()' is not implemented async signal safe in glibc, although according to Posix, it should be. When the child tries to execute commands returned from isolator prepare(), it will use os::system which uses 'fork'.

      I observed this stack trace when I debug a deadlock:

      (gdb) bt
      #0  0x00007f8fb2d5d2ce in __lll_lock_wait_private () from /lib64/libc.so.6
      #1  0x00007f8fb2ce1d8e in _L_lock_44 () from /lib64/libc.so.6
      #2  0x00007f8fb2cdab4c in ptmalloc_lock_all () from /lib64/libc.so.6
      #3  0x00007f8fb2d11d65 in fork () from /lib64/libc.so.6
      #4  0x00007f8fb4e898de in system (command=..., directory=<value optimized out>, envp=..., uid=0, gid=0, redirectIO=<value optimized out>, pipeRead=29, pipeWrite=30, 
          commands=std::list = {...}) at ../../../mesos/3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp:558
      #5  mesos::internal::slave::execute (command=..., directory=<value optimized out>, envp=..., uid=0, gid=0, redirectIO=<value optimized out>, pipeRead=29, pipeWrite=30, 
          commands=std::list = {...}) at ../../../mesos/src/slave/containerizer/mesos_containerizer.cpp:483
      #6  0x00007f8fb4e97bab in __call<, 0, 1, 2, 3, 4, 5, 6, 7, 8> (__functor=<value optimized out>)
          at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/functional:1137
      #7  operator()<> (__functor=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/functional:1191
      #8  std::tr1::_Function_handler<int(), std::tr1::_Bind<int (*(mesos::CommandInfo, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, os::ExecEnv, unsigned int, unsigned int, bool, int, int, std::list<Option<mesos::CommandInfo>, std::allocator<Option<mesos::CommandInfo> > >))(const mesos::CommandInfo&, const std::string&, const os::ExecEnv&, uid_t, gid_t, bool, int, int, const std::list<Option<mesos::CommandInfo>, std::allocator<Option<mesos::CommandInfo> > >&)> >::_M_invoke(const std::tr1::_Any_data &) (__functor=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/functional:1654
      #9  0x00007f8fb4fcaebe in mesos::internal::slave::_childMain(const std::tr1::function<int()> &, int *) (childFunction=..., pipes=0x7f8fad4f0040)
          at ../../../mesos/src/slave/containerizer/linux_launcher.cpp:193
      #10 0x00007f8fb2d4db6d in clone () from /lib64/libc.so.6
      (gdb) info thread
      * 1 Thread 0x7f8fad4f1700 (LWP 62980)  0x00007f8fb2d5d2ce in __lll_lock_wait_private () from /lib64/libc.so.6
      

      This stack trace matches the stack trace that has been discussed in glibc issue tracker:
      https://sourceware.org/bugzilla/show_bug.cgi?id=4737

      And they marked this issue as "WON'T FIX". Here is some discussion:

      The Austin group met yesterday and retained the decision to interpret fork as
      async-signal-unsafe with future specifications mandating that posix_spawn be
      made async-signal-safe to fill the functionality gap.  Minutes of the meeting
      are available at https://www.opengroup.org/austin/docs/austin_446.txt.
      
      I think this bug can now be closed as "WONTFIX"
      

      Attachments

        Issue Links

          Activity

            People

              jieyu Jie Yu
              jieyu Jie Yu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: