Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5988

PollSocketImpl can write to a stale fd.

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.0.1
    • libprocess
    • Mesosphere Sprint 40, Mesosphere Sprint 41
    • 3

    Description

      When tracking down MESOS-5986 with Greg Mann and Anand Mazumdar. We were curious why PollSocketImpl avoids the same issue. It seems that PollSocketImpl has a similar race, however in the case of PollSocketImpl we will simply write to a stale file descriptor.

      One example is PollSocketImpl::send(const char*, size_t):

      https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/poll_socket.cpp#L241-L245

      Future<size_t> PollSocketImpl::send(const char* data, size_t size)
      {
        return io::poll(get(), io::WRITE)
          .then(lambda::bind(&internal::socket_send_data, get(), data, size));
      }
      
      Future<size_t> socket_send_data(int s, const char* data, size_t size)
      {
        CHECK(size > 0);
      
        while (true) {
          ssize_t length = send(s, data, size, MSG_NOSIGNAL);
      
      #ifdef __WINDOWS__
          int error = WSAGetLastError();
      #else
          int error = errno;
      #endif // __WINDOWS__
      
          if (length < 0 && net::is_restartable_error(error)) {
            // Interrupted, try again now.
            continue;
          } else if (length < 0 && net::is_retryable_error(error)) {
            // Might block, try again later.
            return io::poll(s, io::WRITE)
              .then(lambda::bind(&internal::socket_send_data, s, data, size));
          } else if (length <= 0) {
            // Socket error or closed.
            if (length < 0) {
              const string error = os::strerror(errno);
              VLOG(1) << "Socket error while sending: " << error;
            } else {
              VLOG(1) << "Socket closed while sending";
            }
            if (length == 0) {
              return length;
            } else {
              return Failure(ErrnoError("Socket send failed"));
            }
          } else {
            CHECK(length > 0);
      
            return length;
          }
        }
      }
      

      If the last reference to the Socket goes away before the socket_send_data loop completes, then we will write to a stale fd!

      It turns out that we have avoided this issue because in libprocess we happen to keep a reference to the Socket around when sending:

      https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/process.cpp#L1678-L1707

      void send(Encoder* encoder, Socket socket)
      {
        switch (encoder->kind()) {
          case Encoder::DATA: {
            size_t size;
            const char* data = static_cast<DataEncoder*>(encoder)->next(&size);
            socket.send(data, size)
              .onAny(lambda::bind(
                  &internal::_send,
                  lambda::_1,
                  socket,
                  encoder,
                  size));
            break;
          }
          case Encoder::FILE: {
            off_t offset;
            size_t size;
            int fd = static_cast<FileEncoder*>(encoder)->next(&offset, &size);
            socket.sendfile(fd, offset, size)
              .onAny(lambda::bind(
                  &internal::_send,
                  lambda::_1,
                  socket,
                  encoder,
                  size));
            break;
          }
        }
      }
      

      However, this may not be true in all call-sites going forward. Currently, it appears that http::Connection can trigger this bug.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            greggomann Greg Mann
            bmahler Benjamin Mahler
            Benjamin Mahler Benjamin Mahler
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Agile

                Completed Sprints:
                Mesosphere Sprint 40 ended 09/Aug/16
                Mesosphere Sprint 41 ended 03/Sep/16
                View on Board

                Slack

                  Issue deployment