Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-4970

Crash in INKVConnInternal when handle_event is called after destroy()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • 7.1.0
    • HTTP
    • None

    Description

      We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) where the downstream origin is down with a backtrace that looks something like:

      (gdb) bt
      #0  0x0000000000000000 in ?? ()
      #1  0x00000000004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
          edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
      #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
          edata=0x2afe6399fc40) at InkAPI.cc:1060
      #3  0x00000000006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
          calling_code=1) at I_Continuation.h:146
      #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
          calling_code=1) at UnixEThread.cc:144
      #5  0x00000000006f993b in EThread::execute (this=0x2afe3dd07000)
          at UnixEThread.cc:195
      #6  0x00000000006f832a in spawn_thread_internal (a=0x2afe3badf400)
          at Thread.cc:88
      #7  0x0000003861c079d1 in start_thread () from /lib64/libpthread.so.0
      #8  0x00000038614e8b5d in clone () from /lib64/libc.so.6
      

      Which looks a bit odd-- as frame 0 is missing. From digging into it a bit more (with the help of amc) we found that the VC we where calling was an INKContInternal (meaning an INKVConnInternal):

      (gdb) p (INKVConnInternal) *vc_server
      $5 = {<INKContInternal> = {<DummyVConnection> = {<VConnection> = {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 0x2afe63a93170}, 
                handler = (int (Continuation::*)(Continuation *, int, 
          void *)) 0x4cfd90 <INKVConnInternal::handle_event(int, void*)>, mutex = {
                  m_ptr = 0x0}, link = {<SLink<Continuation>> = {next = 0x0}, 
                  prev = 0x0}}, lerrno = 20600}, <No data fields>}, 
          mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
           <(anonymous namespace)::handleTransformationPluginEvents(TSCont, TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, m_deleted = 1, 
          m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
          nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
          vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
          nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
          vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
        m_output_vc = 0x2afe63091a88}
      

      From looking at the debug logs that lead up to the crash, I'm seeing that some events (namely timeout events) are being called after the VConn has been destroy()'d . This lead me to find that INKVConnInternal::handle_event is actually checking if that is the case-- and then re-destroying everything, which makes no sense.

      So although the ideal would be to not call handle_event on a closed VConn, crashing is definitely not acceptable. My solution is to continue to only call the event handler if the VConn hasn't been deleted-- but instead of attempting to re-destroy the connection, we'll leave it be (unless we are in debug mode-- where I'll throw in an assert).

      I did some looking at this on ATS7 and it looks like this is all fixed by the cleanup of the whole free-ing stuff for VConns (https://github.com/apache/trafficserver/pull/752/files).

      Attachments

        Activity

          People

            jacksontj Thomas Jackson
            jacksontj Thomas Jackson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 5h 10m
                5h 10m