Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-3085

Large POSTs over (relatively) slower connections failing in ats5

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 5.0.1
    • 5.2.0
    • SSL

    Description

      We ran into a production issue where large POSTs (30MB or high) are failing over slower connection speeds after ats5 roll out (the problem could be easily reproduced using a charles proxy with throttling enabled).

      Further debugging isolated the issue to uploads over SSL connections and after a lot of debugging the issue appears to be the below:

      ATS calls SSL_read() followed by SSL_get_error() to check if there was any error in the read. This is repeated until either the complete data is read or an error occurs. However, from the openssl documentation, it is recommended to call ERR_clear_error() prior to calling SSL_read() + SSL_get_error() to ensure the error queue is clean of any leftover/garbage errors. It's not clear what might be corrupting the error queue of the SSL context in a tight loop - possibly, some new feature in ats5. In any case, calling ERR_clear_error() is a good idea and adding this seems to resolve the post failures.

      Documentation from openSSL and some related notes on stackoverflow:

      https://www.openssl.org/docs/ssl/SSL_get_error.html

      http://stackoverflow.com/questions/18179128/how-to-manage-the-error-queue-in-openssl-ssl-get-error-and-err-get-error

      "SSL_get_error() returns a result code (suitable for the C ``switch''
      statement) for a preceding call to SSL_connect(), SSL_accept(),
      SSL_do_handshake(), SSL_read(), SSL_peek(), or SSL_write() on ssl. The value
      returned by that TLS/SSL I/O function must be passed to SSL_get_error() in
      parameter ret.
      
      In addition to ssl and ret, SSL_get_error() inspects the current thread's
      OpenSSL error queue. Thus, SSL_get_error() must be used in the same thread that
      performed the TLS/SSL I/O operation, and no other OpenSSL function calls should
      appear in between. The current thread's error queue must be empty before the
      TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably."
      
      "SSL_get_error does not call ERR_get_error. So if you just call SSL_get_error,
      the error stays in the queue.
      
      You should be calling ERR_clear_error prior to ANY SSL-call(SSL_read, SSL_write
      etc) that is followed by SSL_get_error, otherwise you may be reading an old
      error that occurred previously in the current thread."
      

      Attachments

        1. TS-3085.diff
          9 kB
          Sudheer Vinukonda

        Activity

          People

            sudheerv Sudheer Vinukonda
            sudheerv Sudheer Vinukonda
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: