Uploaded image for project: 'Thrift'
  1. Thrift
  2. THRIFT-5127

Race condition in TNonblockingServer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.13.0
    • None
    • C++ - Library
    • None

    Description

      When TNonblockingServer::stop method is called on a different thread shortly after TNonblockingServer::serve, the server occassionally fails to terminate.

      The following sequence of events has been observed with Thrift 0.13:

      1. TNonblockingServer::serve starts spawning listener threads.
      2. Another thread calls TNonblockingServer::stop before all listeners are created.
        A shutdown request is sent to those IO threads which have been already initialized (but not all).
      3. TNonblockingServer::serve completes spawning the remaining listener threads (including the primary IO thread with index 0).
      4. TNonblockingServer::serve continues to run despite the stop request, since the main thread and some of the listener threads are still active.

      The issue seems to be caused by late initialization of TNonblockingIOThread's state.
      Server's listener threads are spawned in the TNonblockingServer::serve method (in a nested call to registerEvents. They finish initialization for some of their state in the TNonblockingIOThread::run method (part of the Runnable interface).
      One of the fields which is initialized at that stage is the notificationPipeFDs_ array, which as far as I can tell is used to pass messages between threads.

      It seems that the thread which invokes TNonblockingServer::stop might attempt to use the notification pipe to request shutdown while the notificationPipeFDs_ descriptor array is still uninitialized.
      In that case, the message is lost (the TNonblockingIOThread::notify call will return immediately) and the target thread never exits.

      Btw. the threadId_ field of TNonblockingIOThread is also accessed concurrently by multiple threads without synchronization:

      • the field is written in TNonblockingIOThread::registerEvents after creation of the listener thread,
      • there is a read in TNonblockingIOThread::breakLoop when server is being stopped.

      I'm attaching sample code which can reproduce the issue (although not deterministically).
      Some tweaking of the STOPPING_THREAD_DELAY constant might be necessary to observe the deadlock.

      Attachments

        1. thrift_deadlock.cpp
          2 kB
          Adam Jakubek

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ajakubek Adam Jakubek
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h