Uploaded image for project: 'Thrift'
  1. Thrift
  2. THRIFT-4080

Unix sockets can get stuck forever

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0
    • Fix Version/s: None
    • Component/s: Python - Library
    • Labels:
      None
    • Environment:

      Ubuntu 14.04

    • Flags:
      Important

      Description

      I had the problem that if the network connection is really bad the server sometimes does not accept more connections. "Really bad" means that a simple ping event sent via thrift could take 15 seconds.

      Having this issue for nearly 2 years now I could finally figure it out:
      There is no timeout when the socket receives data. After a connection is established and the socket object is created, the connection can drop which yields to the socket blocking forever.

      I added a timeout in the TSocket accept function which makes the socket throw a resource not available error after 30 seconds:

      def accept(self):
      client, addr = self.handle.accept()
      – added timeout of 30.0 seconds
      client.setsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, struct.pack('LL', 30, 0))
      result = TSocket()
      result.setHandle(client)
      return result

      Gives this error:
      buff = self.handle.recv(sz)
      error: [Errno 11] Resource temporarily unavailable

      I also tried using python socket's settimeout() function which does not work. Only setting the receive timeout times out dropped connections.

      This bug does not appear on stable connections. However, I have 4 devices that are connected via WiFi and my ThreadedServer gets stuck about 4-5 times a day. The ThreadedServer has 5 threads, thus all 5 sockets get stuck all the time...

      FYI here is the strace of the stuck socket:
      [pid 2698] futex(0x7faf50000d80, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
      [pid 2697] read(4, <unfinished ...>
      [pid 2693] accept(7,

      {sa_family=AF_INET6, sin6_port=htons(39911), inet_pton(AF_INET6, "::ffff:46.125.249.41", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}

      , [28]) = 6
      [pid 2693] recvfrom(6,

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Tuxa David Fankhauser
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: