Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-4717

Http2 stack explosion

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2.1, 7.0.0
    • Component/s: HTTP/2
    • Labels:
      None

      Description

      We see this periodically with high traffic loads. ATS crashes with 7000+ frames on the stack. The bulk of the frames are the following frame sequence.

      #117 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90, event=100, data=0x2b0bad0c7cf0)
          at ../iocore/eventsystem/I_Continuation.h:150
      #118 0x000000000064c05d in Http2ClientSession::state_start_frame_read (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0)
          at Http2ClientSession.cc:451
      #119 0x000000000064b0af in Http2ClientSession::main_event_handler (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at Http2ClientSession.cc:292
      #120 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90, event=100, data=0x2b0bad0c7cf0)
          at ../iocore/eventsystem/I_Continuation.h:150
      #121 0x000000000064c386 in Http2ClientSession::state_complete_frame_read (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0)
          at Http2ClientSession.cc:483
      #122 0x000000000064b0af in Http2ClientSession::main_event_handler (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at Http2ClientSession.cc:292
      #123 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90, event=100, data=0x2b0bad0c7cf0)
          at ../iocore/eventsystem/I_Continuation.h:150
      #124 0x000000000064c05d in Http2ClientSession::state_start_frame_read (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0)
          at Http2ClientSession.cc:451
      

      We had cherry picked in the fix for TS-4209 to correctly enforce the concurrent stream limit. But in the latest crash of this type, it looks like we are pulling small items from cache, so the stream lives and dies on the stack. The concurrent active connection count never reaches the limit.

      I am going to try to change the state_state_start_frame_read/state_complete_frame_read logic from recursing handlers to a loop.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                shinrich Susan Hinrichs
                Reporter:
                shinrich Susan Hinrichs
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h
                  5h