Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-4475

Crash in Log-Collation client after using inactivity-cop.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 6.1.1
    • 6.2.1, 7.1.0
    • Logging
    • None

    Description

      Background: We recently tried making use of inactivity-cop by setting it to 300s instead of the default one-day setting. This was to address an issue where, under heavy load, ATS would become un-responsive to client requests, and the condition would persist after traffic was stopped with the active queue saying 0 connections but 'netstat -na' showing a bunch of established connections (up to the throttle limit approximately).

      Inactivity cop seemed to help ATS handle this situation, but we have since experienced a couple of core dumps over the last four day period. It seems occasionally the Log Collation Client State Machine will have event value 105 or VC_EVENT_INACTIVITY_TIMEOUT, but when it reaches read_signal_and_update() it tries to call the continuation handler which down the line does not know about this event thus causing core dump !"unexpcted state" [sic].

      Here is the back-trace –

      (gdb) bt
      #0 0x00002b67cd5405f7 in raise () from /lib64/libc.so.6
      #1 0x00002b67cd541e28 in abort () from /lib64/libc.so.6
      #2 0x00002b67cb032921 in ink_die_die_die () at ink_error.cc:43
      #3 0x00002b67cb0329da in ink_fatal_va (fmt=0x2b67cb0442dc "%s:%d: failed assert `%s`", ap=0x7ffc690e7ba8) at ink_error.cc:65
      #4 0x00002b67cb032a79 in ink_fatal (message_format=0x2b67cb0442dc "%s:%d: failed assert `%s`") at ink_error.cc:73
      #5 0x00002b67cb0305a6 in _ink_assert (expression=0x7fb422 "!\"unexpcted state\"", file=0x7fb35b "LogCollationClientSM.cc",
      line=445) at ink_assert.cc:37
      #6 0x000000000069c86b in LogCollationClientSM::client_idle (this=0x2b681400bb00, event=105) at LogCollationClientSM.cc:445
      #7 0x000000000069b427 in LogCollationClientSM::client_handler (this=0x2b681400bb00, event=105, data=0x2b680c017020)
      at LogCollationClientSM.cc:119
      #8 0x0000000000502cc6 in Continuation::handleEvent (this=0x2b681400bb00, event=105, data=0x2b680c017020)
      at ../iocore/eventsystem/I_Continuation.h:153
      #9 0x0000000000783d40 in read_signal_and_update (event=105, vc=0x2b680c016f00) at UnixNetVConnection.cc:150
      #10 0x0000000000787a22 in UnixNetVConnection::mainEvent (this=0x2b680c016f00, event=1, e=0x127ad60) at UnixNetVConnection.cc:1188
      #11 0x0000000000502cc6 in Continuation::handleEvent (this=0x2b680c016f00, event=1, data=0x127ad60)
      at ../iocore/eventsystem/I_Continuation.h:153
      #12 0x000000000077d943 in InactivityCop::check_inactivity (this=0x1209a00, event=2, e=0x127ad60) at UnixNet.cc:102
      #13 0x0000000000502cc6 in Continuation::handleEvent (this=0x1209a00, event=2, data=0x127ad60)
      at ../iocore/eventsystem/I_Continuation.h:153
      #14 0x00000000007a5df6 in EThread::process_event (this=0x2b67cf7bb010, e=0x127ad60, calling_code=2) at UnixEThread.cc:128
      #15 0x00000000007a61f5 in EThread::execute (this=0x2b67cf7bb010) at UnixEThread.cc:207
      #16 0x0000000000534430 in main (argv=0x7ffc690e82e8) at Main.cc:1918

      I believe it takes a wrong turn here –

      #9 0x0000000000783d40 in read_signal_and_update (event=105, vc=0x2b680c016f00) at UnixNetVConnection.cc:150
      150 vc->read.vio._cont->handleEvent(event, &vc->read.vio);
      (gdb) list
      145 static inline int
      146 read_signal_and_update(int event, UnixNetVConnection *vc)
      147 {
      148 vc->recursion++;
      149 if (vc->read.vio._cont)

      { 150 vc->read.vio._cont->handleEvent(event, &vc->read.vio); 151 }

      else {
      152 switch (event)

      { 153 case VC_EVENT_EOS: 154 case VC_EVENT_ERROR: (gdb) list 155 case VC_EVENT_ACTIVE_TIMEOUT: 156 case VC_EVENT_INACTIVITY_TIMEOUT: 157 Debug("inactivity_cop", "event %d: null read.vio cont, closing vc %p", event, vc); 158 vc->closed = 1; 159 break; 160 default: 161 Error("Unexpected event %d for vc %p", event, vc); 162 ink_release_assert(0); 163 break; 164 }

      Note: I understand that there were several issues related to TS-3196 concerning inactivity_cop and this section of code.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pbchou Peter Chou
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 50m
                3h 50m