Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
Mesosphere Sprint 40, Mesosphere Sprint 41
-
3
Description
When tracking down MESOS-5986 with greggomann and anandmazumdar. We were curious why PollSocketImpl avoids the same issue. It seems that PollSocketImpl has a similar race, however in the case of PollSocketImpl we will simply write to a stale file descriptor.
One example is PollSocketImpl::send(const char*, size_t):
https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/poll_socket.cpp#L241-L245
Future<size_t> PollSocketImpl::send(const char* data, size_t size) { return io::poll(get(), io::WRITE) .then(lambda::bind(&internal::socket_send_data, get(), data, size)); } Future<size_t> socket_send_data(int s, const char* data, size_t size) { CHECK(size > 0); while (true) { ssize_t length = send(s, data, size, MSG_NOSIGNAL); #ifdef __WINDOWS__ int error = WSAGetLastError(); #else int error = errno; #endif // __WINDOWS__ if (length < 0 && net::is_restartable_error(error)) { // Interrupted, try again now. continue; } else if (length < 0 && net::is_retryable_error(error)) { // Might block, try again later. return io::poll(s, io::WRITE) .then(lambda::bind(&internal::socket_send_data, s, data, size)); } else if (length <= 0) { // Socket error or closed. if (length < 0) { const string error = os::strerror(errno); VLOG(1) << "Socket error while sending: " << error; } else { VLOG(1) << "Socket closed while sending"; } if (length == 0) { return length; } else { return Failure(ErrnoError("Socket send failed")); } } else { CHECK(length > 0); return length; } } }
If the last reference to the Socket goes away before the socket_send_data loop completes, then we will write to a stale fd!
It turns out that we have avoided this issue because in libprocess we happen to keep a reference to the Socket around when sending:
https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/process.cpp#L1678-L1707
void send(Encoder* encoder, Socket socket) { switch (encoder->kind()) { case Encoder::DATA: { size_t size; const char* data = static_cast<DataEncoder*>(encoder)->next(&size); socket.send(data, size) .onAny(lambda::bind( &internal::_send, lambda::_1, socket, encoder, size)); break; } case Encoder::FILE: { off_t offset; size_t size; int fd = static_cast<FileEncoder*>(encoder)->next(&offset, &size); socket.sendfile(fd, offset, size) .onAny(lambda::bind( &internal::_send, lambda::_1, socket, encoder, size)); break; } } }
However, this may not be true in all call-sites going forward. Currently, it appears that http::Connection can trigger this bug.
Attachments
Issue Links
- relates to
-
MESOS-5986 SSL Socket CHECK can fail after socket receives EOF
- Resolved