Uploaded image for project: 'Thrift'
  1. Thrift
  2. THRIFT-748

C++ TSocket default linger setting breaks forked parent process

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Trivial
    • Resolution: Won't Fix
    • Affects Version/s: 0.2, 0.3
    • Fix Version/s: None
    • Component/s: C++ - Library
    • Labels:
      None
    • Environment:

      Cygwin 1.7.1 on Windows XP SP3, Thrift 0.2.0 & r760184 & Trunk

      Description

      If a Thrift C++ Client opens a TSocket, writes some data, then calls fork(), the child process can terminate the parent processes' connection by deleting its copy of the parent TSocket.

      In particular,
      the default setting of lingerOn_ = 1 causes a RST to be sent in close(socket_) in TSocket->close()

      Discussion:

      This behaviour is identical to the behaviour of unix sockets when SO_LINGER is set (implementations vary).
      However, the SO_LINGER default for sockets is off not on. This provides unexpected behaviour in TSocket.

      This design choice makes it really difficult to program a Thrift client that forks other clients in C++, as the first process to call TSocket->close() terminates all copies of the connection. The processes all have to call TSocket->setLinger(0,0) or (1,timeout) before deleting the TSocket, closing the TSocket, or exiting. (This workaround only succeeds with the suggested fix in THRIFT-747 ).

      However, the design choice also prevents deadlock/slowdown issues where a forked process holds open a copy of the parent's Thrift connections. It also makes close non-blocking, which is ideal in a destructor.

      The design choice may also be an attempt to implement the block to send then close behaviour described in http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
      However, the default linger interval of 0 turns the linger setting into a hard reset.
      And in the absence of linger, the kernel can usually send small thrift messages by itself.

      Options:

      • Change the default lingerOn to 0 - rely on the kernel to resend a limited number of times
      • Change the default lingerVal to > 0
      • a large value like INT_MAX would match the default connection, send, and recv 'no timeout' behaviour

      TODO:

      • Confirm issue on Linux - see attached test code
      • Decide if a change to the defaults is needed
      • Document workaround after resolution of THRIFT-747 - call TSocket->setLinger(0,0) or (1,timeout) if forking

        Attachments

        1. thrift_linger_example.cpp
          5 kB
          Tim Wilson-Brown

          Issue Links

            Activity

              People

              • Assignee:
                jking3 James E. King III
                Reporter:
                twilsonb Tim Wilson-Brown
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 72h
                  72h
                  Remaining:
                  Remaining Estimate - 72h
                  72h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified