Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-4916

Http2ConnectionState::restart_streams infinite loop causes deadlock

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • None
    • 6.2.1, 7.0.0
    • Core, HTTP/2
    • None

    Description

      Http2ConnectionState::restart_streams falls into an infinite loop while holding a lock, which leads to cache updates to start failing.

      The infinite loop is caused by traversing a list whose last element “next” points to the element itself and the traversal never finishes.

      Thread 51 (Thread 0x2aaab3d04700 (LWP 34270)):
      #0  0x00002aaaaacf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
      #1  rcv_window_update_frame (cstate=..., frame=...) at Http2ConnectionState.cc:627
      #2  0x00002aaaaacf9738 in Http2ConnectionState::main_event_handler (this=0x2ae6ba5284c8, event=<optimized out>, edata=<optimized out>) at Http2ConnectionState.cc:823
      #3  0x00002aaaaacef1c3 in Continuation::handleEvent (data=0x2aaab3d039a0, event=2253, this=0x2ae6ba5284c8) at ../../iocore/eventsystem/I_Continuation.h:153
      #4  send_connection_event (cont=cont@entry=0x2ae6ba5284c8, event=event@entry=2253, edata=edata@entry=0x2aaab3d039a0) at Http2ClientSession.cc:58
      #5  0x00002aaaaacef462 in Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:426
      #6  0x00002aaaaacf0982 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
      #7  Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
      #8  0x00002aaaaacef5a3 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
      #9  Http2ClientSession::state_complete_frame_read (this=0x2ae6ba528290, event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:431
      #10 0x00002aaaaacf0982 in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=0x2ae6ba528290) at ../../iocore/eventsystem/I_Continuation.h:153
      #11 Http2ClientSession::state_start_frame_read (this=0x2ae6ba528290, event=<optimized out>, edata=0x2aab7b237f18) at Http2ClientSession.cc:399
      #12 0x00002aaaaae67e2b in Continuation::handleEvent (data=0x2aab7b237f18, event=100, this=<optimized out>) at ../../iocore/eventsystem/I_Continuation.h:153
      #13 read_signal_and_update (vc=0x2aab7b237e00, vc@entry=0x1, event=event@entry=100) at UnixNetVConnection.cc:153
      #14 UnixNetVConnection::readSignalAndUpdate (this=this@entry=0x2aab7b237e00, event=event@entry=100) at UnixNetVConnection.cc:1036
      #15 0x00002aaaaae47653 in SSLNetVConnection::net_read_io (this=0x2aab7b237e00, nh=0x2aaab2409cc0, lthread=0x2aaab2406000) at SSLNetVConnection.cc:595
      #16 0x00002aaaaae5558c in NetHandler::mainNetEvent (this=0x2aaab2409cc0, event=<optimized out>, e=<optimized out>) at UnixNet.cc:513
      #17 0x00002aaaaae8d2e6 in Continuation::handleEvent (data=0x2aaab0bfa700, event=5, this=<optimized out>) at I_Continuation.h:153
      #18 EThread::process_event (calling_code=5, e=0x2aaab0bfa700, this=0x2aaab2406000) at UnixEThread.cc:148
      #19 EThread::execute (this=0x2aaab2406000) at UnixEThread.cc:275
      #20 0x00002aaaaae8c0e6 in spawn_thread_internal (a=0x2aaab0b25bb0) at Thread.cc:86
      #21 0x00002aaaad6b3aa1 in start_thread (arg=0x2aaab3d04700) at pthread_create.c:301
      #22 0x00002aaaae8bc93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
      

      Here is the stream_list trace.

      (gdb) thread 51
      [Switching to thread 51 (Thread 0x2aaab3d04700 (LWP 34270))]
      #0  0x00002aaaaacf3fee in Http2ConnectionState::restart_streams (this=0x2ae6ba5284c8) at Http2ConnectionState.cc:913
      
      (gdb) trace_list stream_list
      ------- count=0 -------
      id=29
      this=0x2ae673f0c840
      next=0x2aaac05d8900
      prev=(nil)
      ------- count=1 -------
      id=27
      this=0x2aaac05d8900
      next=0x2ae5b6bbec00
      prev=0x2ae673f0c840
      ------- count=2 -------
      id=19
      this=0x2ae5b6bbec00
      next=0x2ae5b6bbec00
      prev=0x2aaac05d8900
      ------- count=3 -------
      id=19
      this=0x2ae5b6bbec00
      next=0x2ae5b6bbec00
      prev=0x2aaac05d8900
      . . . 
      ------- count=5560 -------
      id=19
      this=0x2ae5b6bbec00
      next=0x2ae5b6bbec00
      prev=0x2aaac05d8900
      . . .
      

      Currently I am working on finding out why the list in question got into this “impossible” (broken) state and and eventually coming up with a fix.

      Attachments

        Issue Links

          Activity

            People

              gancho Gancho Tenev
              gancho Gancho Tenev
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 8h
                  8h