Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-5773

Qpid Protocol Negotiation Sometime Fails with Python Qpid with SSL

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.24, 0.26
    • 0.29
    • Python Client
    • None
    • Python QPID 0.24 or 0.26 Client, C++ Broker (same version)
      Python 2.6 (RHEL 6.5)
      Eventlet Monkey Patched for all but OS
      More than 15 concurrent connections writing messages
      Using OpenStack Oslo QPID Driver to send Messages

    Description

      When running an application using OpenStack (using the QPID driver for Messaging) at large scales, we have have found that connections are aborted shortly after they are established. This is because the Broker closes the connection after the Max Negotiation time because it does not believe the Protocol Negotiation has completed (no matter how large the Max Negotiation time is set to).

      To reproduce this issue we created a test program that just starts up 50 threads and a connection pool size of 20 and have each thread just writing messages over and over to simulate the large number of concurrent writes, which reproduced this issue every time after the number of connections got above 13 or 14.

      In going back to the 0.22 Python QPID Client (with any version of the broker), we found this issue did not occur, but if we use the 0.24 or 0.26 version of the Python Client, it occurred every time. In tracing this back through trial an error it came down to the change in the recv method in the transports.py module.

      We are still not sure what the underlying root cause of it is but have isolated it down to the usage of the "recv_into" method on the ssl socket when the event let monkey patching is done (so that the eventlet code is in the middle). When the monkey patching is taken out this works fine, or if the read method is called instead it works fine.

      With the change made in QPID-4872 there are benefits to doing the retry on the write/send buffer, but the current implementation for the recv isn't really helping to pass the buffer down to the OpenSSL code since the ssl socket recv_into just does a read anyway.

      So irrespective of what the actual issue with with using the recv_into with the eventlet code in the middle, it seems like this can be changed back to a read to solve the problem.

      The proposal here is to either change the transports.py recv back to what it was in the 0.22 version, or to do something like this (which will minimize the risk of the change because it would still be doing the retry on the same number of bytes from the previous failed call like the 0.24 / 0.26 versions are doing). (Note: This change was done to minimize the number of lines changed to show the difference, so the variable names may not be the most relevant now):

      def recv(self, n):
      ....if self.read_retry == None:
      ........self.read_retry = n
      ....self._clear_state()
      ....try:
      ........r = self.tls.read( self.read_retry )
      ........self.read_retry = None
      ........return r
      ....except SSLError, e:
      ........if self._update_state(e.args[0]):
      ............# will retry on next invocation
      ............return None
      ........self.read_retry = None
      ........raise
      ....except:
      ........self.read_retry = None
      ........raise

      Attachments

        1. transports.py
          7 kB
          Brent Tang

        Activity

          People

            kgiusti Ken Giusti
            brenttang Brent Tang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: