Traffic Server
  1. Traffic Server
  2. TS-857

Crash Report: HttpTunnel::chain_abort_all -> HttpServerSession::do_io_close -> UnixNetVConnection::do_io_close

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1.0
    • Fix Version/s: 3.1.4, 3.0.5
    • Component/s: HTTP, Network
    • Labels:
      None
    • Environment:

      in my branch that is something same as 3.0.x

    • Backport to Version:

      Description

      here is the bt from the crash, some of the information is missing due to we have not enable the --enable-debug configure options.

      [New process 7532]
      #0  ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>)
          at ink_stack_trace.cc:68
      68	    fp = (void **) (*fp);
      (gdb) bt
      #0  ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>)
          at ink_stack_trace.cc:68
      #1  0x00002ba641dccef1 in ink_stack_trace_dump (sighandler_frame=<value optimized out>) at ink_stack_trace.cc:114
      #2  0x00000000004df020 in signal_handler (sig=<value optimized out>) at signals.cc:225
      #3  <signal handler called>
      #4  0x00000000006a1ea9 in UnixNetVConnection::do_io_close (this=0x1cc9bd20, alerrno=<value optimized out>)
          at ../../iocore/eventsystem/I_Lock.h:297
      #5  0x000000000051f1d0 in HttpServerSession::do_io_close (this=0x2aaab0042c80, alerrno=20600) at HttpServerSession.cc:127
      #6  0x000000000056d1e9 in HttpTunnel::chain_abort_all (this=0x2aabeeffdd70, p=0x2aabeeffdf68) at HttpTunnel.cc:1300
      #7  0x00000000005269ca in HttpSM::tunnel_handler_ua (this=0x2aabeeffc070, event=104, c=0x2aabeeffdda8) at HttpSM.cc:2987
      #8  0x0000000000571dfc in HttpTunnel::consumer_handler (this=0x2aabeeffdd70, event=104, c=0x2aabeeffdda8) at HttpTunnel.cc:1232
      #9  0x0000000000572032 in HttpTunnel::main_handler (this=0x2aabeeffdd70, event=1088608784, data=<value optimized out>)
          at HttpTunnel.cc:1456
      #10 0x00000000006a6307 in write_to_net_io (nh=0x2aaaab12d688, vc=0x1cc876e0, thread=<value optimized out>)
          at ../../iocore/eventsystem/I_Continuation.h:146
      #11 0x000000000069ce97 in NetHandler::mainNetEvent (this=0x2aaaab12d688, event=<value optimized out>, e=0x171c1ed0) at UnixNet.cc:405
      #12 0x00000000006cddaf in EThread::process_event (this=0x2aaaab12c010, e=0x171c1ed0, calling_code=5) at I_Continuation.h:146
      #13 0x00000000006ce6bc in EThread::execute (this=0x2aaaab12c010) at UnixEThread.cc:262
      #14 0x00000000006cd0ee in spawn_thread_internal (a=0x171b58f0) at Thread.cc:88
      #15 0x0000003c33c064a7 in start_thread () from /lib64/libpthread.so.0
      #16 0x0000003c330d3c2d in clone () from /lib64/libc.so.6
      (gdb) info f
      Stack level 0, frame at 0x40e2b790:
       rip = 0x2ba641dccdf3 in ink_stack_trace_get(void**, int, int) (ink_stack_trace.cc:68); saved rip 0x2ba641dccef1
       called by frame at 0x40e2bbe0
       source language c++.
       Arglist at 0x40e2b770, args: stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>
       Locals at 0x40e2b770, Previous frame's sp is 0x40e2b790
       Saved registers:
        rbx at 0x40e2b778, rbp at 0x40e2b780, rip at 0x40e2b788
      (gdb) 
      
      1. locking-jp1.patch
        0.8 kB
        John Plevyak
      2. ts-857.diff
        104 kB
        Alan M. Carroll
      3. ts-857.diff
        4 kB
        Alan M. Carroll
      4. ts-857.diff
        2 kB
        weijin
      5. TS-857-jp1.patch
        0.8 kB
        John Plevyak

        Activity

        Hide
        AdunGaos added a comment - - edited
        #0  ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>) at ink_stack_trace.cc:68
        68          fp = (void **) (*fp);
        (gdb) bt
        #0  ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>) at ink_stack_trace.cc:68
        #1  0x00002b9a63a34ef1 in ink_stack_trace_dump (sighandler_frame=<value optimized out>) at ink_stack_trace.cc:114
        #2  0x00000000004ddc10 in signal_handler (sig=<value optimized out>) at signals.cc:225
        #3  <signal handler called>
        #4  0x00000000006990d9 in UnixNetVConnection::do_io_close (this=0x2aaab0052f00, alerrno=<value optimized out>) at ../../iocore/eventsystem/I_Lock.h:297
        #5  0x0000000000513f5a in HttpClientSession::do_io_close (this=0x78046f0, alerrno=-1) at HttpClientSession.cc:310
        #6  0x000000000052588c in HttpSM::tunnel_handler_ua (this=0x2aaaafcd8c40, event=103, c=0x2aaaafcda968) at HttpSM.cc:3027
        #7  0x000000000056f9dc in HttpTunnel::consumer_handler (this=0x2aaaafcda930, event=103, c=0x2aaaafcda968) at HttpTunnel.cc:1232
        #8  0x000000000057095e in HttpTunnel::finish_all_internal (this=0x2aaaafcda930, p=0x2aaaafcdab28, chain=false) at HttpTunnel.cc:1359
        #9  0x0000000000526e07 in HttpSM::tunnel_handler_server (this=0x2aaaafcd8c40, event=104, p=0x2aaaafcdab28) at HttpTunnel.h:311
        #10 0x000000000056eb21 in HttpTunnel::producer_handler (this=0x2aaaafcda930, event=104, p=0x2aaaafcdab28) at HttpTunnel.cc:1136
        #11 0x000000000056fca0 in HttpTunnel::main_handler (this=0x2aaaafcda930, event=1106414096, data=<value optimized out>) at HttpTunnel.cc:1452
        #12 0x000000000069f097 in read_from_net (nh=0x2aaaab22e688, vc=0x2aaab04fef40, thread=<value optimized out>) at ../../iocore/eventsystem/I_Continuation.h:146
        #13 0x00000000006940e1 in NetHandler::mainNetEvent (this=0x2aaaab22e688, event=<value optimized out>, e=0x2aaaab9b601c) at UnixNet.cc:389
        #14 0x00000000006c4fdf in EThread::process_event (this=0x2aaaab22d010, e=0x693bac0, calling_code=5) at I_Continuation.h:146
        #15 0x00000000006c58ec in EThread::execute (this=0x2aaaab22d010) at UnixEThread.cc:262
        #16 0x00000000006c431e in spawn_thread_internal (a=0x692f5d0) at Thread.cc:88
        #17 0x00000031060064a7 in start_thread () from /lib64/libpthread.so.0
        #18 0x00000031054d3c2d in clone () from /lib64/libc.so.6
        (gdb) info f
        Stack level 0, frame at 0x41f22560:
         rip = 0x2b9a63a34df3 in ink_stack_trace_get(void**, int, int) (ink_stack_trace.cc:68); saved rip 0x2b9a63a34ef1
         called by frame at 0x41f229b0
         source language c++.
         Arglist at 0x41f22540, args: stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>
         Locals at 0x41f22540, Previous frame's sp is 0x41f22560
         Saved registers:
          rbx at 0x41f22548, rbp at 0x41f22550, rip at 0x41f22558
        
        Show
        AdunGaos added a comment - - edited #0 ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>) at ink_stack_trace.cc:68 68 fp = (void **) (*fp); (gdb) bt #0 ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>) at ink_stack_trace.cc:68 #1 0x00002b9a63a34ef1 in ink_stack_trace_dump (sighandler_frame=<value optimized out>) at ink_stack_trace.cc:114 #2 0x00000000004ddc10 in signal_handler (sig=<value optimized out>) at signals.cc:225 #3 <signal handler called> #4 0x00000000006990d9 in UnixNetVConnection::do_io_close ( this =0x2aaab0052f00, alerrno=<value optimized out>) at ../../iocore/eventsystem/I_Lock.h:297 #5 0x0000000000513f5a in HttpClientSession::do_io_close ( this =0x78046f0, alerrno=-1) at HttpClientSession.cc:310 #6 0x000000000052588c in HttpSM::tunnel_handler_ua ( this =0x2aaaafcd8c40, event=103, c=0x2aaaafcda968) at HttpSM.cc:3027 #7 0x000000000056f9dc in HttpTunnel::consumer_handler ( this =0x2aaaafcda930, event=103, c=0x2aaaafcda968) at HttpTunnel.cc:1232 #8 0x000000000057095e in HttpTunnel::finish_all_internal ( this =0x2aaaafcda930, p=0x2aaaafcdab28, chain= false ) at HttpTunnel.cc:1359 #9 0x0000000000526e07 in HttpSM::tunnel_handler_server ( this =0x2aaaafcd8c40, event=104, p=0x2aaaafcdab28) at HttpTunnel.h:311 #10 0x000000000056eb21 in HttpTunnel::producer_handler ( this =0x2aaaafcda930, event=104, p=0x2aaaafcdab28) at HttpTunnel.cc:1136 #11 0x000000000056fca0 in HttpTunnel::main_handler ( this =0x2aaaafcda930, event=1106414096, data=<value optimized out>) at HttpTunnel.cc:1452 #12 0x000000000069f097 in read_from_net (nh=0x2aaaab22e688, vc=0x2aaab04fef40, thread=<value optimized out>) at ../../iocore/eventsystem/I_Continuation.h:146 #13 0x00000000006940e1 in NetHandler::mainNetEvent ( this =0x2aaaab22e688, event=<value optimized out>, e=0x2aaaab9b601c) at UnixNet.cc:389 #14 0x00000000006c4fdf in EThread::process_event ( this =0x2aaaab22d010, e=0x693bac0, calling_code=5) at I_Continuation.h:146 #15 0x00000000006c58ec in EThread::execute ( this =0x2aaaab22d010) at UnixEThread.cc:262 #16 0x00000000006c431e in spawn_thread_internal (a=0x692f5d0) at Thread .cc:88 #17 0x00000031060064a7 in start_thread () from /lib64/libpthread.so.0 #18 0x00000031054d3c2d in clone () from /lib64/libc.so.6 (gdb) info f Stack level 0, frame at 0x41f22560: rip = 0x2b9a63a34df3 in ink_stack_trace_get(void**, int , int ) (ink_stack_trace.cc:68); saved rip 0x2b9a63a34ef1 called by frame at 0x41f229b0 source language c++. Arglist at 0x41f22540, args: stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out> Locals at 0x41f22540, Previous frame's sp is 0x41f22560 Saved registers: rbx at 0x41f22548, rbp at 0x41f22550, rip at 0x41f22558
        Hide
        mohan_zl added a comment - - edited

        The following stack trace is similar to above, but the entry point is different.

        #0  ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>) at ink_stack_trace.cc:68
        68          fp = (void **) (*fp);
        (gdb) bt
        #0  ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>) at ink_stack_trace.cc:68
        #1  0x00000038ff016ef1 in ink_stack_trace_dump (sighandler_frame=<value optimized out>) at ink_stack_trace.cc:114
        #2  0x00000000004ddc10 in signal_handler (sig=<value optimized out>) at signals.cc:225
        #3  <signal handler called>
        #4  0x00000000006990d9 in UnixNetVConnection::do_io_close (this=0x20efd5e0, alerrno=<value optimized out>) at ../../iocore/eventsystem/I_Lock.h:297
        #5  0x0000000000513f5a in HttpClientSession::do_io_close (this=0x2aaab405c420, alerrno=-1) at HttpClientSession.cc:310
        #6  0x000000000052588c in HttpSM::tunnel_handler_ua (this=0x2aaab87fc4f0, event=103, c=0x2aaab87fe218) at HttpSM.cc:3027
        #7  0x000000000056f9dc in HttpTunnel::consumer_handler (this=0x2aaab87fe1e0, event=103, c=0x2aaab87fe218) at HttpTunnel.cc:1232
        #8  0x000000000057095e in HttpTunnel::finish_all_internal (this=0x2aaab87fe1e0, p=0x2aaab87fe3d8, chain=false) at HttpTunnel.cc:1359
        #9  0x0000000000526e07 in HttpSM::tunnel_handler_server (this=0x2aaab87fc4f0, event=104, p=0x2aaab87fe3d8) at HttpTunnel.h:311
        #10 0x000000000056eb21 in HttpTunnel::producer_handler (this=0x2aaab87fe1e0, event=104, p=0x2aaab87fe3d8) at HttpTunnel.cc:1136
        #11 0x000000000056fca0 in HttpTunnel::main_handler (this=0x2aaab87fe1e0, event=-1678707408, data=<value optimized out>) at HttpTunnel.cc:1452
        #12 0x000000000069f097 in read_from_net (nh=0x2aaaaadfe688, vc=0x20eb4c40, thread=<value optimized out>) at ../../iocore/eventsystem/I_Continuation.h:146
        #13 0x00000000006940e1 in NetHandler::mainNetEvent (this=0x2aaaaadfe688, event=<value optimized out>, e=0x2aaaab706028) at UnixNet.cc:389
        #14 0x00000000006c4fdf in EThread::process_event (this=0x2aaaaadfd010, e=0x1fc96f60, calling_code=5) at I_Continuation.h:146
        #15 0x00000000006c58ec in EThread::execute (this=0x2aaaaadfd010) at UnixEThread.cc:262
        #16 0x00000000004be112 in main (argc=<value optimized out>, argv=<value optimized out>) at Main.cc:1957
        
        Show
        mohan_zl added a comment - - edited The following stack trace is similar to above, but the entry point is different. #0 ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>) at ink_stack_trace.cc:68 68 fp = (void **) (*fp); (gdb) bt #0 ink_stack_trace_get (stack=<value optimized out>, len=<value optimized out>, signalhandler_frame=<value optimized out>) at ink_stack_trace.cc:68 #1 0x00000038ff016ef1 in ink_stack_trace_dump (sighandler_frame=<value optimized out>) at ink_stack_trace.cc:114 #2 0x00000000004ddc10 in signal_handler (sig=<value optimized out>) at signals.cc:225 #3 <signal handler called> #4 0x00000000006990d9 in UnixNetVConnection::do_io_close ( this =0x20efd5e0, alerrno=<value optimized out>) at ../../iocore/eventsystem/I_Lock.h:297 #5 0x0000000000513f5a in HttpClientSession::do_io_close ( this =0x2aaab405c420, alerrno=-1) at HttpClientSession.cc:310 #6 0x000000000052588c in HttpSM::tunnel_handler_ua ( this =0x2aaab87fc4f0, event=103, c=0x2aaab87fe218) at HttpSM.cc:3027 #7 0x000000000056f9dc in HttpTunnel::consumer_handler ( this =0x2aaab87fe1e0, event=103, c=0x2aaab87fe218) at HttpTunnel.cc:1232 #8 0x000000000057095e in HttpTunnel::finish_all_internal ( this =0x2aaab87fe1e0, p=0x2aaab87fe3d8, chain= false ) at HttpTunnel.cc:1359 #9 0x0000000000526e07 in HttpSM::tunnel_handler_server ( this =0x2aaab87fc4f0, event=104, p=0x2aaab87fe3d8) at HttpTunnel.h:311 #10 0x000000000056eb21 in HttpTunnel::producer_handler ( this =0x2aaab87fe1e0, event=104, p=0x2aaab87fe3d8) at HttpTunnel.cc:1136 #11 0x000000000056fca0 in HttpTunnel::main_handler ( this =0x2aaab87fe1e0, event=-1678707408, data=<value optimized out>) at HttpTunnel.cc:1452 #12 0x000000000069f097 in read_from_net (nh=0x2aaaaadfe688, vc=0x20eb4c40, thread=<value optimized out>) at ../../iocore/eventsystem/I_Continuation.h:146 #13 0x00000000006940e1 in NetHandler::mainNetEvent ( this =0x2aaaaadfe688, event=<value optimized out>, e=0x2aaaab706028) at UnixNet.cc:389 #14 0x00000000006c4fdf in EThread::process_event ( this =0x2aaaaadfd010, e=0x1fc96f60, calling_code=5) at I_Continuation.h:146 #15 0x00000000006c58ec in EThread::execute ( this =0x2aaaaadfd010) at UnixEThread.cc:262 #16 0x00000000004be112 in main (argc=<value optimized out>, argv=<value optimized out>) at Main.cc:1957
        Hide
        Leif Hedstrom added a comment -

        Moving all unassigned bugs out to 3.1.2

        Show
        Leif Hedstrom added a comment - Moving all unassigned bugs out to 3.1.2
        Hide
        Zhao Yongming added a comment -

        weijin is working on threading & net crashing issue. he have got some directions for this issue.

        Show
        Zhao Yongming added a comment - weijin is working on threading & net crashing issue. he have got some directions for this issue.
        Hide
        weijin added a comment -

        httpsm can be callbacked between different threads, when terminating the sm, we call do_io_close to close the net_vc, but net_vc::do_io_close is not thread safe, Inactivlity_cop and netHandler can also close net_vc when net_vc::closed is set without locking the mutex of net_vc. I hope this can also explain TS-934.

        I have read amc`s patch for TS-934 carefully recently, he contributed a lot in the problem solving. I have two questions: 1) should we lock mutex of net_vc in Inactivlity_cop and netHandler 2) should one thread can close net_vc of a different thread.

        I tend to add some codes in net_vc::do_io_close and net_vc::mainEvent to make it thread safe:
        UnixNetVConnection::do_io_close
        {
        if (thread != this_ethread())

        { thread->schedule_imm(this, EVENT_VC_TRY_TO_CLOSE); return; }

        disable_read(this);
        disable_write(this);
        .....
        close_UnixNetVConneciton(this, t);
        }

        UnixNetVConnection::mainEvent(int event, void *e)
        {
        if (event == EVENT_VC_TRY_TO_CLOSE)

        { do_io_close(); return EVENT_DONE; }

        ....
        // check the active and inactivity timeout
        ....
        }

        Show
        weijin added a comment - httpsm can be callbacked between different threads, when terminating the sm, we call do_io_close to close the net_vc, but net_vc::do_io_close is not thread safe, Inactivlity_cop and netHandler can also close net_vc when net_vc::closed is set without locking the mutex of net_vc. I hope this can also explain TS-934 . I have read amc`s patch for TS-934 carefully recently, he contributed a lot in the problem solving. I have two questions: 1) should we lock mutex of net_vc in Inactivlity_cop and netHandler 2) should one thread can close net_vc of a different thread. I tend to add some codes in net_vc::do_io_close and net_vc::mainEvent to make it thread safe: UnixNetVConnection::do_io_close { if (thread != this_ethread()) { thread->schedule_imm(this, EVENT_VC_TRY_TO_CLOSE); return; } disable_read(this); disable_write(this); ..... close_UnixNetVConneciton(this, t); } UnixNetVConnection::mainEvent(int event, void *e) { if (event == EVENT_VC_TRY_TO_CLOSE) { do_io_close(); return EVENT_DONE; } .... // check the active and inactivity timeout .... }
        Hide
        weijin added a comment -

        @Leif, @amc, what`s your advices.

        Show
        weijin added a comment - @Leif, @amc, what`s your advices.
        Hide
        weijin added a comment -

        This patch make net_vc::do_io_close thread safe, and it is simple and have no need to lock vc`s mutex. In my test, the crash did not happen again, I think we should have more test.

        Show
        weijin added a comment - This patch make net_vc::do_io_close thread safe, and it is simple and have no need to lock vc`s mutex. In my test, the crash did not happen again, I think we should have more test.
        Hide
        weijin added a comment -

        mohan_zl found a bug in my first patch. second patch is provided.

        Show
        weijin added a comment - mohan_zl found a bug in my first patch. second patch is provided.
        Hide
        Alan M. Carroll added a comment -

        I have reviewed the patch and it looks basically correct to me. I had a few style issues, which I have corrected in my version of the patch.

        • I put the new event type in iocore/eventsystem/I_VConnection.h where the other VConnection events are.
        • Added some comments.
        • Fixed a broken comment in I_Event.h which misdirected me about schedule_imm in my earlier work. I found the correct comment in a very old copy of the file and put it back.

        I am currently doing some live testing, but I would like to commit this for 3.1.1.

        Show
        Alan M. Carroll added a comment - I have reviewed the patch and it looks basically correct to me. I had a few style issues, which I have corrected in my version of the patch. I put the new event type in iocore/eventsystem/I_VConnection.h where the other VConnection events are. Added some comments. Fixed a broken comment in I_Event.h which misdirected me about schedule_imm in my earlier work. I found the correct comment in a very old copy of the file and put it back. I am currently doing some live testing, but I would like to commit this for 3.1.1.
        Hide
        Alan M. Carroll added a comment -

        Committed patch.

        Show
        Alan M. Carroll added a comment - Committed patch.
        Hide
        John Plevyak added a comment -

        So, this patch throws a VC_EVENT_DO_CLOSE event at a NetVC, but it doesn't record that fact that the event is outstanding? What prevents the NetVC from being deallocated before the event is processed?

        Show
        John Plevyak added a comment - So, this patch throws a VC_EVENT_DO_CLOSE event at a NetVC, but it doesn't record that fact that the event is outstanding? What prevents the NetVC from being deallocated before the event is processed?
        Hide
        John Plevyak added a comment -

        So, this patch throws a VC_EVENT_DO_CLOSE event at a NetVC, but it doesn't record that fact that the event is outstanding? What prevents the NetVC from being deallocated before the event is processed?

        Show
        John Plevyak added a comment - So, this patch throws a VC_EVENT_DO_CLOSE event at a NetVC, but it doesn't record that fact that the event is outstanding? What prevents the NetVC from being deallocated before the event is processed?
        Hide
        Alan M. Carroll added a comment -

        Where would it record the pending event? The presumption is that the only other thread that might close it is the one that will process the event.

        But the real issue is that sometimes the lock is dropped between the time the lock try is done and the event is scheduled, leading to scheduling on the null thread. If you look at TS-934 you can see how that issue is handled there, but even that's not sufficient because normal VC processing from NetHandler::mainNetEvent does not lock the VCs, so there isn't any way do anything safely in this case.

        Show
        Alan M. Carroll added a comment - Where would it record the pending event? The presumption is that the only other thread that might close it is the one that will process the event. But the real issue is that sometimes the lock is dropped between the time the lock try is done and the event is scheduled, leading to scheduling on the null thread. If you look at TS-934 you can see how that issue is handled there, but even that's not sufficient because normal VC processing from NetHandler::mainNetEvent does not lock the VCs, so there isn't any way do anything safely in this case.
        Hide
        Leif Hedstrom added a comment -

        Moved to 3.1.4, please move bugs back to 3.1.3, which you will work on in the next 2 weeks.

        Show
        Leif Hedstrom added a comment - Moved to 3.1.4, please move bugs back to 3.1.3, which you will work on in the next 2 weeks.
        Hide
        Alan M. Carroll added a comment -

        My proposed patched. The support in IntrusivePtr.h is more extensive than actually required and stems from intermediate fixes that required that support. Since that might be useful in the future it was left in, although not currently used. The primary change is to use a reference counted list for the NetHandler open_list which prevents the VCs from being cleaned up while shared in other threads. Still need to verify that they do get cleaned up at some point.

        Show
        Alan M. Carroll added a comment - My proposed patched. The support in IntrusivePtr.h is more extensive than actually required and stems from intermediate fixes that required that support. Since that might be useful in the future it was left in, although not currently used. The primary change is to use a reference counted list for the NetHandler open_list which prevents the VCs from being cleaned up while shared in other threads. Still need to verify that they do get cleaned up at some point.
        Hide
        Zhao Yongming added a comment -

        hi, I think this may be the same as one magic event is flying in the system:
        #9 0x0000000000572032 in HttpTunnel::main_handler (this=0x2aabeeffdd70, event=1088608784, data=<value optimized out>)
        look at the event id. I don't think we have such event.

        I think I have no such crashes in my env with out this patch, what do you think of the situation?

        Show
        Zhao Yongming added a comment - hi, I think this may be the same as one magic event is flying in the system: #9 0x0000000000572032 in HttpTunnel::main_handler (this=0x2aabeeffdd70, event=1088608784, data=<value optimized out>) look at the event id. I don't think we have such event. I think I have no such crashes in my env with out this patch, what do you think of the situation?
        Hide
        Zhao Yongming added a comment -
        Show
        Zhao Yongming added a comment - we have the codes commit in our tree in: https://gitorious.org/trafficserver/taobao/commit/c0fdec56bacea139a026412038378d7a8f65a731
        Hide
        John Plevyak added a comment -

        Could the non-locked access be coming from the InactivityCop? See attached lock patch which handles a race in that code.

        Show
        John Plevyak added a comment - Could the non-locked access be coming from the InactivityCop? See attached lock patch which handles a race in that code.
        Hide
        John Plevyak added a comment -

        This fixes a race in inactivity timeouts coming from the InactivityCop

        Show
        John Plevyak added a comment - This fixes a race in inactivity timeouts coming from the InactivityCop
        Hide
        Otto van der Schaaf added a comment -

        another trace, put here as per amc's request:

        Program received signal SIGSEGV, Segmentation fault.
        ink_atomiclist_push (l=0x100bb0, item=0x11a4f00) at ink_queue.cc:457
        457 volatile_void_p *adr_of_next = (volatile_void_p *) ADDRESS_OF_NEXT(item, l->offset);
        (gdb) bt
        #0 ink_atomiclist_push (l=0x100bb0, item=0x11a4f00) at ink_queue.cc:457
        #1 0x00000000006aab3b in ProtectedQueue::enqueue (this=0x100bb0, e=<optimized out>, fast_signal=false) at ProtectedQueue.cc:53
        #2 0x0000000000511ff6 in HttpClientSession::do_io_close (this=0x11b89b0, alerrno=-1) at HttpClientSession.cc:331
        #3 0x0000000000522fba in HttpSM::tunnel_handler_ua (this=0x7fffeb726650, event=103, c=0x7fffeb728270) at HttpSM.cc:3029
        #4 0x0000000000566006 in HttpTunnel::consumer_handler (this=0x7fffeb7281c8, event=103, c=0x7fffeb728270) at HttpTunnel.cc:1232
        #5 0x0000000000566800 in HttpTunnel::finish_all_internal (this=0x7fffeb7281c8, p=0x7fffeb7284b0, chain=false) at HttpTunnel.cc:1359
        #6 0x0000000000520680 in local_finish_all (p=0x7fffeb7284b0, this=<optimized out>) at HttpTunnel.h:311
        #7 HttpSM::tunnel_handler_transform_read (this=0x7fffeb726650, event=104, p=0x7fffeb7284b0) at HttpSM.cc:3572
        #8 0x0000000000565a6c in HttpTunnel::producer_handler (this=0x7fffeb7281c8, event=104, p=0x7fffeb7284b0) at HttpTunnel.cc:1136
        #9 0x0000000000565eb0 in HttpTunnel::main_handler (this=0x7fffeb7281c8, event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1452
        #10 0x00000000004e78c5 in TransformTerminus::handle_event (this=0x7fffe4036998, event=<optimized out>, edata=<optimized out>) at Transform.cc:241
        #11 0x00000000006ac450 in handleEvent (data=0x11a4fc0, event=1, this=<optimized out>) at I_Continuation.h:146
        #12 EThread::process_event (this=0x7ffff474f010, e=0x11a4fc0, calling_code=1) at UnixEThread.cc:142
        #13 0x00000000006ad01e in EThread::execute (this=0x7ffff474f010) at UnixEThread.cc:234
        #14 0x000000000047d5f2 in main (argc=<optimized out>, argv=<optimized out>) at Main.cc:1939

        Show
        Otto van der Schaaf added a comment - another trace, put here as per amc's request: Program received signal SIGSEGV, Segmentation fault. ink_atomiclist_push (l=0x100bb0, item=0x11a4f00) at ink_queue.cc:457 457 volatile_void_p *adr_of_next = (volatile_void_p *) ADDRESS_OF_NEXT(item, l->offset); (gdb) bt #0 ink_atomiclist_push (l=0x100bb0, item=0x11a4f00) at ink_queue.cc:457 #1 0x00000000006aab3b in ProtectedQueue::enqueue (this=0x100bb0, e=<optimized out>, fast_signal=false) at ProtectedQueue.cc:53 #2 0x0000000000511ff6 in HttpClientSession::do_io_close (this=0x11b89b0, alerrno=-1) at HttpClientSession.cc:331 #3 0x0000000000522fba in HttpSM::tunnel_handler_ua (this=0x7fffeb726650, event=103, c=0x7fffeb728270) at HttpSM.cc:3029 #4 0x0000000000566006 in HttpTunnel::consumer_handler (this=0x7fffeb7281c8, event=103, c=0x7fffeb728270) at HttpTunnel.cc:1232 #5 0x0000000000566800 in HttpTunnel::finish_all_internal (this=0x7fffeb7281c8, p=0x7fffeb7284b0, chain=false) at HttpTunnel.cc:1359 #6 0x0000000000520680 in local_finish_all (p=0x7fffeb7284b0, this=<optimized out>) at HttpTunnel.h:311 #7 HttpSM::tunnel_handler_transform_read (this=0x7fffeb726650, event=104, p=0x7fffeb7284b0) at HttpSM.cc:3572 #8 0x0000000000565a6c in HttpTunnel::producer_handler (this=0x7fffeb7281c8, event=104, p=0x7fffeb7284b0) at HttpTunnel.cc:1136 #9 0x0000000000565eb0 in HttpTunnel::main_handler (this=0x7fffeb7281c8, event=<optimized out>, data=<optimized out>) at HttpTunnel.cc:1452 #10 0x00000000004e78c5 in TransformTerminus::handle_event (this=0x7fffe4036998, event=<optimized out>, edata=<optimized out>) at Transform.cc:241 #11 0x00000000006ac450 in handleEvent (data=0x11a4fc0, event=1, this=<optimized out>) at I_Continuation.h:146 #12 EThread::process_event (this=0x7ffff474f010, e=0x11a4fc0, calling_code=1) at UnixEThread.cc:142 #13 0x00000000006ad01e in EThread::execute (this=0x7ffff474f010) at UnixEThread.cc:234 #14 0x000000000047d5f2 in main (argc=<optimized out>, argv=<optimized out>) at Main.cc:1939
        Hide
        Otto van der Schaaf added a comment -

        i retested with trunk, i cannot reproduce the stack trace i added anymore, so i think this trace is solved

        Show
        Otto van der Schaaf added a comment - i retested with trunk, i cannot reproduce the stack trace i added anymore, so i think this trace is solved
        Hide
        John Plevyak added a comment -

        Eliminate a race in UnixNetVConnection::do_io_close().

        Show
        John Plevyak added a comment - Eliminate a race in UnixNetVConnection::do_io_close().
        Hide
        Brian Geffon added a comment -

        Backported to 3.0.x in commit 245d197e401f70936a47cdb66682a72b801596c3

        Show
        Brian Geffon added a comment - Backported to 3.0.x in commit 245d197e401f70936a47cdb66682a72b801596c3

          People

          • Assignee:
            Brian Geffon
            Reporter:
            Zhao Yongming
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development