Traffic Server
  1. Traffic Server
  2. TS-937

EThread::execute still processing cancelled event

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.1, 2.1.9
    • Fix Version/s: 3.1.3
    • Component/s: Core
    • Labels:
      None
    • Environment:

      RHEL6

      Description

      The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled.

      Brian

      Program received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
      0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
      130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
      Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
      (gdb) bt
      #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
      #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
      #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
      #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
      #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
      (gdb) bt full
      #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
              lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
      #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
              done_one = false
              e = 0x1db45c0
              NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
              next_time = 1314647904419648000
      #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
              p = 0xfb7e80
      #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
      No symbol table info available.
      #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
      No symbol table info available.
      (gdb) f 0
      #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
      130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
      (gdb) p *e
      $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0, 
        immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
      
      1. UnixEThread.patch
        1 kB
        Brian Geffon
      2. ts937.full.patch
        11 kB
        Brian Geffon

        Issue Links

          Activity

          Brian Geffon created issue -
          Brian Geffon made changes -
          Field Original Value New Value
          Status Open [ 1 ] Patch Available [ 10002 ]
          Brian Geffon made changes -
          Attachment UnixEThread.patch [ 12492142 ]
          Brian Geffon made changes -
          Summary EThread::execute still processing cancelled thread EThread::execute still processing cancelled event
          Leif Hedstrom made changes -
          Fix Version/s 3.1.1 [ 12316496 ]
          Leif Hedstrom made changes -
          Fix Version/s 3.1.2 [ 12317605 ]
          Fix Version/s 3.1.1 [ 12316496 ]
          Brian Geffon made changes -
          Comment [ FYI, apparently this patch was included in 3.0.2 but there was no reference in the change log that could get confusing down the road. ]
          Leif Hedstrom made changes -
          Assignee Brian Geffon [ briang ]
          Alan M. Carroll made changes -
          Fix Version/s 3.1.3 [ 12317969 ]
          Fix Version/s 3.1.2 [ 12317605 ]
          Brian Geffon made changes -
          Comment [ So, I've began to look into this bug again. To try to determine where the action is being canceled I modified Action to add a const char * volatile cancelled_by; and then simply replaced any instance of ->cancel() to pass the name of the method doing the cancelling:
          ]
          Brian Geffon made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 3.1.2 [ 12317605 ]
          Fix Version/s 3.1.3 [ 12317969 ]
          Resolution Fixed [ 1 ]
          Brian Geffon made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Brian Geffon made changes -
          Comment [ nevermind, I lied. Reopening :/ ]
          Brian Geffon made changes -
          Comment [ So it turns out that this bug is fixed in TS-1074, I've verified this and am closing this bug. ]
          Brian Geffon made changes -
          Comment [ Ignore notifications about a comment talking about inclusion in 3.0.2, I was mistaken.
          ]
          Brian Geffon made changes -
          Fix Version/s 3.1.3 [ 12317969 ]
          Fix Version/s 3.1.2 [ 12317605 ]
          Brian Geffon made changes -
          Attachment ts937.full.patch [ 12515538 ]
          Igor Galić made changes -
          Description The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled.

          Brian



          Program received signal SIGSEGV, Segmentation fault.
          [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
          0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
          130 MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
          Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
          (gdb) bt
          #0 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
          #1 0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
          #2 0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
          #3 0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
          #4 0x000000361f8e577d in clone () from /lib64/libc.so.6
          (gdb) bt full
          #0 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
                  lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
          #1 0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
                  done_one = false
                  e = 0x1db45c0
                  NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
                  next_time = 1314647904419648000
          #2 0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
                  p = 0xfb7e80
          #3 0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
          No symbol table info available.
          #4 0x000000361f8e577d in clone () from /lib64/libc.so.6
          No symbol table info available.
          (gdb) f 0
          #0 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
          130 MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
          (gdb) p *e
          $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0,
            immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}

          The included GDB log will show that ATS is trying to process an event that has already been canceled, examining the code of UnixEThread.cc line 232 shows that EThread::process_event gets called without a check for the event being cancelled.

          Brian


          {noformat}
          Program received signal SIGSEGV, Segmentation fault.
          [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
          0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
          130 MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
          Missing separate debuginfos, use: debuginfo-install expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
          (gdb) bt
          #0 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
          #1 0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
          #2 0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
          #3 0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
          #4 0x000000361f8e577d in clone () from /lib64/libc.so.6
          (gdb) bt full
          #0 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
                  lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
          #1 0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at UnixEThread.cc:232
                  done_one = false
                  e = 0x1db45c0
                  NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, tail = 0xfc75f0}
                  next_time = 1314647904419648000
          #2 0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
                  p = 0xfb7e80
          #3 0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
          No symbol table info available.
          #4 0x000000361f8e577d in clone () from /lib64/libc.so.6
          No symbol table info available.
          (gdb) f 0
          #0 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
          130 MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
          (gdb) p *e
          $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, in_the_prot_queue = 0, in_the_priority_queue = 0,
            immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 0x0}, prev = 0x0}}
          {noformat}
          Brian Geffon made changes -
          Link This issue is part of TS-1117 [ TS-1117 ]
          Brian Geffon made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Leif Hedstrom made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Brian Geffon
              Reporter:
              Brian Geffon
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development