[QPID-5773] Qpid Protocol Negotiation Sometime Fails with Python Qpid with SSL - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.24, 0.26
Fix Version/s: 0.29
Component/s: Python Client
Labels:
None
Environment:

Python QPID 0.24 or 0.26 Client, C++ Broker (same version)
Python 2.6 (RHEL 6.5)
Eventlet Monkey Patched for all but OS
More than 15 concurrent connections writing messages
Using OpenStack Oslo QPID Driver to send Messages

Description

When running an application using OpenStack (using the QPID driver for Messaging) at large scales, we have have found that connections are aborted shortly after they are established. This is because the Broker closes the connection after the Max Negotiation time because it does not believe the Protocol Negotiation has completed (no matter how large the Max Negotiation time is set to).

To reproduce this issue we created a test program that just starts up 50 threads and a connection pool size of 20 and have each thread just writing messages over and over to simulate the large number of concurrent writes, which reproduced this issue every time after the number of connections got above 13 or 14.

In going back to the 0.22 Python QPID Client (with any version of the broker), we found this issue did not occur, but if we use the 0.24 or 0.26 version of the Python Client, it occurred every time. In tracing this back through trial an error it came down to the change in the recv method in the transports.py module.

We are still not sure what the underlying root cause of it is but have isolated it down to the usage of the "recv_into" method on the ssl socket when the event let monkey patching is done (so that the eventlet code is in the middle). When the monkey patching is taken out this works fine, or if the read method is called instead it works fine.

With the change made in ~~QPID-4872~~ there are benefits to doing the retry on the write/send buffer, but the current implementation for the recv isn't really helping to pass the buffer down to the OpenSSL code since the ssl socket recv_into just does a read anyway.

So irrespective of what the actual issue with with using the recv_into with the eventlet code in the middle, it seems like this can be changed back to a read to solve the problem.

The proposal here is to either change the transports.py recv back to what it was in the 0.22 version, or to do something like this (which will minimize the risk of the change because it would still be doing the retry on the same number of bytes from the previous failed call like the 0.24 / 0.26 versions are doing). (Note: This change was done to minimize the number of lines changed to show the difference, so the variable names may not be the most relevant now):

def recv(self, n):
....if self.read_retry == None:
........self.read_retry = n
....self._clear_state()
....try:
........r = self.tls.read( self.read_retry )
........self.read_retry = None
........return r
....except SSLError, e:
........if self._update_state(e.args[0]):
............# will retry on next invocation
............return None
........self.read_retry = None
........raise
....except:
........self.read_retry = None
........raise

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

transports.py
20/May/14 00:08
7 kB
Brent Tang

Activity

People

Assignee:: Ken Giusti

Reporter:: Brent Tang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 19/May/14 23:44

Updated:: 26/Sep/14 15:43

Resolved:: 29/May/14 18:17