Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-4838

After TS-3612 restructuring, very slow SSL sessions and HttpSM::state_raw_http_server_open errors



    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 6.2.0, 7.0.0
    • 6.2.1, 7.0.0
    • Core, SSL
    • None
    • CentOS/RHEL 7.2, x86_64


      We have been using TrafficServer 5.3.2 for quite some time now, for forward proxying of a number of different HTML5 applications, one of the most important ones being YouTube's TV interface, e.g. https://youtube.com/tv. This is all hosted on CentOS 7.2 x86_64 machines.

      We recently upgraded to 6.2.0, and then started having problems with the CONNECT requests for port 443 which are generated by the YouTube app. It seems like these connections are "stalled" somehow, sometimes for >10 seconds. Meanwhile, diags.log is getting spammed lots of the following:

      [Sep  9 16:45:47.683] Server {0x2b3e50c0b700} ERROR: [HttpSM::state_raw_http_server_open] event: EVENT_INTERVAL state: 0 server_entry: (nil)

      Requests that seem to stall are most likely all of the CONNECT kind, e.g.:

      1473432382.474 30405 TCP_MISS/200 4916 CONNECT ad.doubleclick.net:443/ - DIRECT/ad.doubleclick.net -
      1473432382.481 30411 TCP_MISS/200 54024 CONNECT i9.ytimg.com:443/ - DIRECT/i9.ytimg.com -
      1473432382.486 30417 TCP_MISS/200 5389 CONNECT pagead2.googlesyndication.com:443/ - DIRECT/pagead2.googlesyndication.com -
      1473432390.451 42772 TCP_MISS/200 5198 CONNECT csi.gstatic.com:443/ - DIRECT/csi.gstatic.com -
      1473432390.459 43833 TCP_MISS/200 11610 CONNECT www.youtube.com:443/ - DIRECT/www.youtube.com -
      1473432390.483 38414 TCP_MISS/200 2870983 CONNECT r17---sn-5hnednl7.googlevideo.com:443/ - DIRECT/r17---sn-5hnednl7.googlevideo.com -

      As part of figuring out how to diagnose this, I tried a downgrade to TrafficServer 6.1.1, and this made all the stalling and problems disappear. Afterwards, I did a git bisect on master, from the branch point of 6.1 to the branch point of 6.2, and I ended up at commit af76977:

      Author: Susan Hinrichs <shinrich@draggingnagging.corp.ne1.yahoo.com>
      Date: Wed Apr 13 19:57:39 2016 +0000

      TS-3612: Restructure client session and transaction processing. This closes #570.

      Unfortunately, this is a quite big refactoring commit, so it is not possible to revert it individually to see whether it improves things.

      I read TS-3612 and #570, and I saw there were also a number of follow-up commits to fix various problems with it, but this particular problem of stalled SSL connections is still occurring with master as of today, 2016-09-09.

      I realize that this report is still missing reproduction details, since it is tricky to analyze what the YouTube app is doing, and simple curl https:// tests appear to go fast, and don't seem to trigger any stalling. But YouTube itself is pretty easy to try out, I think.


        Issue Links



              jamespeach James Peach
              dim Dimitry Andric
              0 Vote for this issue
              3 Start watching this issue



                Time Tracking

                  Original Estimate - Not Specified
                  Not Specified
                  Remaining Estimate - 0h
                  Time Spent - 1.5h