Traffic Server
  1. Traffic Server
  2. TS-877

Segfault caused by HTTPInfo::object_key_get

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 3.0.0
    • Fix Version/s: None
    • Component/s: HTTP
    • Labels:
    • Environment:

      Ubuntu 10.04tls 64bit

      Description

      Segfault caused by HTTPInfo::object_key_get

      Reading symbols from /usr/lib/libtsutil.so.3...done.
      Loaded symbols for /usr/lib/libtsutil.so.3
      Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
      Loaded symbols for /lib/libpthread.so.0
      Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib/libnsl.so.1
      Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib/libresolv.so.2
      Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib/librt.so.1
      Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done.
      Loaded symbols for /lib/libpcre.so.3
      Reading symbols from /lib/libssl.so.0.9.8...(no debugging symbols found)...done.
      Loaded symbols for /lib/libssl.so.0.9.8
      Reading symbols from /lib/libcrypto.so.0.9.8...(no debugging symbols found)...done.
      Loaded symbols for /lib/libcrypto.so.0.9.8
      Reading symbols from /usr/lib/libtcl8.4.so.0...(no debugging symbols found)...done.
      Loaded symbols for /usr/lib/libtcl8.4.so.0
      Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib/libdl.so.2
      Reading symbols from /lib/libexpat.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib/libexpat.so.1
      Reading symbols from /lib/libz.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib/libz.so.1
      Reading symbols from /usr/lib/libstdc++.so.6...(no debugging symbols found)...done.
      Loaded symbols for /usr/lib/libstdc++.so.6
      Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done.
      Loaded symbols for /lib/libm.so.6
      Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib/libgcc_s.so.1
      Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
      Loaded symbols for /lib/libc.so.6
      Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib64/ld-linux-x86-64.so.2
      Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib/libnss_files.so.2
      Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib/libnss_dns.so.2
      Reading symbols from /usr/libexec/trafficserver/header_filter.so...done.
      Loaded symbols for /usr/libexec/trafficserver/header_filter.so
      Core was generated by `/usr/bin/traffic_server -M -A,9:X'.
      Program terminated with signal 11, Segmentation fault.
      #0 0x00000000005975ef in HTTPInfo::object_key_get (this=0x2ae710002c58) at ../../proxy/hdrs/HTTP.h:1375
      1375 ((int32_t *) & val)[0] = m_alt->m_object_key[0];
      (gdb) bt
      #0 0x00000000005975ef in HTTPInfo::object_key_get (this=0x2ae710002c58) at ../../proxy/hdrs/HTTP.h:1375
      #1 0x0000000000592c72 in HttpTransactCache::SelectFromAlternates (cache_vector=0x1c96568, client_request=0x1c96528, http_config_params=0x2ae66c4b5228)
      at HttpTransactCache.cc:213
      #2 0x00000000006cb4f1 in CacheVC::openReadChooseWriter (this=0x1c96460, event=2, e=0x1e6d6d0) at CacheRead.cc:262
      #3 0x00000000006cbc82 in CacheVC::openReadFromWriter (this=0x1c96460, event=2, e=0x1e6d6d0) at CacheRead.cc:332
      #4 0x00000000004e63e8 in Continuation::handleEvent (this=0x1c96460, event=2, data=0x1e6d6d0) at ../iocore/eventsystem/I_Continuation.h:146
      #5 0x000000000072055d in EThread::process_event (this=0x2ae646b46010, e=0x1e6d6d0, calling_code=2) at UnixEThread.cc:140
      #6 0x000000000072090d in EThread::execute (this=0x2ae646b46010) at UnixEThread.cc:217
      #7 0x000000000071ec32 in spawn_thread_internal (a=0x1a14380) at Thread.cc:88
      #8 0x00002ae643f639ca in start_thread () from /lib/libpthread.so.0
      #9 0x00002ae64616970d in clone () from /lib/libc.so.6
      #10 0x0000000000000000 in ?? ()
      (gdb) print *this
      $1 =

      {m_alt = 0x0}

        Issue Links

          Activity

          Hide
          Leif Hedstrom added a comment -

          Presumably this was fixed in TS-1114, so closing for now.

          Show
          Leif Hedstrom added a comment - Presumably this was fixed in TS-1114 , so closing for now.
          Hide
          Leif Hedstrom added a comment -

          Moving to 3.3.2.

          Show
          Leif Hedstrom added a comment - Moving to 3.3.2.
          Hide
          Leif Hedstrom added a comment -

          Moving this out to 3.3.0, please move back to 3.1.4 if this will be worked on soon. Also, take a look at what can be closed here please!

          Show
          Leif Hedstrom added a comment - Moving this out to 3.3.0, please move back to 3.1.4 if this will be worked on soon . Also, take a look at what can be closed here please!
          Hide
          Zhao Yongming added a comment -

          the root cause of this issue is same as TS-1114, and we have a working patch in TS-1114.

          Show
          Zhao Yongming added a comment - the root cause of this issue is same as TS-1114 , and we have a working patch in TS-1114 .
          Hide
          Leif Hedstrom added a comment -

          Moved to 3.1.4, please move bugs back to 3.1.3, which you will work on in the next 2 weeks.

          Show
          Leif Hedstrom added a comment - Moved to 3.1.4, please move bugs back to 3.1.3, which you will work on in the next 2 weeks.
          Hide
          Leif Hedstrom added a comment -

          Moving all unassigned bugs out to 3.1.2

          Show
          Leif Hedstrom added a comment - Moving all unassigned bugs out to 3.1.2
          Hide
          Leif Hedstrom added a comment -

          Lets move this out to 3.1.1 or 3.1.2, we need better understanding on how this gets corrupted in the first place. And perhaps change the underlying code instead of trying to deal with the symptoms (as we discussed earlier).

          Show
          Leif Hedstrom added a comment - Lets move this out to 3.1.1 or 3.1.2, we need better understanding on how this gets corrupted in the first place. And perhaps change the underlying code instead of trying to deal with the symptoms (as we discussed earlier).
          Hide
          Kingsley Foreman added a comment -

          Running with the first patch over the weekend looks like it moved the bug

          (gdb) bt
          #0 0x00002b1971b7aa75 in raise () from /lib/libc.so.6
          #1 0x00002b1971b7e5c0 in abort () from /lib/libc.so.6
          #2 0x00002b196f7fe380 in ink_die_die_die (retval=1) at ink_error.cc:43
          #3 0x00002b196f7fe461 in ink_fatal_va (return_code=1,
          message_format=0x2b1973e1eeb0 "./hdrs/HTTP.h:1137: failed assert `m_http->m_polarity == HTTP_TYPE_RESPONSE`", ap=0x2b1973e1edb0) at ink_error.cc:65
          #4 0x00002b196f7fe540 in ink_fatal (return_code=1,
          message_format=0x2b1973e1eeb0 "./hdrs/HTTP.h:1137: failed assert `m_http->m_polarity == HTTP_TYPE_RESPONSE`") at ink_error.cc:73
          #5 0x00002b196f7fd025 in _ink_assert (a=0x729870 "m_http->m_polarity == HTTP_TYPE_RESPONSE", f=0x729803 "./hdrs/HTTP.h", l=1137) at ink_assert.cc:44
          #6 0x0000000000504b34 in HTTPHdr::status_get (this=0x2b1971ec5f08) at ./hdrs/HTTP.h:1137
          #7 0x0000000000593163 in HttpTransactCache::calculate_quality_of_match (http_config_param=0x2b1998480bc8, client_request=0x2b19790be698,
          obj_client_request=0x2b1971ec5ec8, obj_origin_server_response=0x2b1971ec5f08) at HttpTransactCache.cc:321
          #8 0x0000000000592d21 in HttpTransactCache::SelectFromAlternates (cache_vector=0x2b19790be6d8, client_request=0x2b19790be698,
          http_config_params=0x2b1998480bc8) at HttpTransactCache.cc:216
          #9 0x00000000006cb519 in CacheVC::openReadChooseWriter (this=0x2b19790be5d0, event=2, e=0x2b19790a8790) at CacheRead.cc:262
          #10 0x00000000006cbcaa in CacheVC::openReadFromWriter (this=0x2b19790be5d0, event=2, e=0x2b19790a8790) at CacheRead.cc:332
          #11 0x00000000004e63e8 in Continuation::handleEvent (this=0x2b19790be5d0, event=2, data=0x2b19790a8790) at ../iocore/eventsystem/I_Continuation.h:146
          #12 0x0000000000720585 in EThread::process_event (this=0x2b197260a010, e=0x2b19790a8790, calling_code=2) at UnixEThread.cc:140
          #13 0x0000000000720935 in EThread::execute (this=0x2b197260a010) at UnixEThread.cc:217
          #14 0x000000000071ec5a in spawn_thread_internal (a=0x2e72380) at Thread.cc:88
          #15 0x00002b196fa279ca in start_thread () from /lib/libpthread.so.0
          #16 0x00002b1971c2d70d in clone () from /lib/libc.so.6
          #17 0x0000000000000000 in ?? ()

          Show
          Kingsley Foreman added a comment - Running with the first patch over the weekend looks like it moved the bug (gdb) bt #0 0x00002b1971b7aa75 in raise () from /lib/libc.so.6 #1 0x00002b1971b7e5c0 in abort () from /lib/libc.so.6 #2 0x00002b196f7fe380 in ink_die_die_die (retval=1) at ink_error.cc:43 #3 0x00002b196f7fe461 in ink_fatal_va (return_code=1, message_format=0x2b1973e1eeb0 "./hdrs/HTTP.h:1137: failed assert `m_http->m_polarity == HTTP_TYPE_RESPONSE`", ap=0x2b1973e1edb0) at ink_error.cc:65 #4 0x00002b196f7fe540 in ink_fatal (return_code=1, message_format=0x2b1973e1eeb0 "./hdrs/HTTP.h:1137: failed assert `m_http->m_polarity == HTTP_TYPE_RESPONSE`") at ink_error.cc:73 #5 0x00002b196f7fd025 in _ink_assert (a=0x729870 "m_http->m_polarity == HTTP_TYPE_RESPONSE", f=0x729803 "./hdrs/HTTP.h", l=1137) at ink_assert.cc:44 #6 0x0000000000504b34 in HTTPHdr::status_get (this=0x2b1971ec5f08) at ./hdrs/HTTP.h:1137 #7 0x0000000000593163 in HttpTransactCache::calculate_quality_of_match (http_config_param=0x2b1998480bc8, client_request=0x2b19790be698, obj_client_request=0x2b1971ec5ec8, obj_origin_server_response=0x2b1971ec5f08) at HttpTransactCache.cc:321 #8 0x0000000000592d21 in HttpTransactCache::SelectFromAlternates (cache_vector=0x2b19790be6d8, client_request=0x2b19790be698, http_config_params=0x2b1998480bc8) at HttpTransactCache.cc:216 #9 0x00000000006cb519 in CacheVC::openReadChooseWriter (this=0x2b19790be5d0, event=2, e=0x2b19790a8790) at CacheRead.cc:262 #10 0x00000000006cbcaa in CacheVC::openReadFromWriter (this=0x2b19790be5d0, event=2, e=0x2b19790a8790) at CacheRead.cc:332 #11 0x00000000004e63e8 in Continuation::handleEvent (this=0x2b19790be5d0, event=2, data=0x2b19790a8790) at ../iocore/eventsystem/I_Continuation.h:146 #12 0x0000000000720585 in EThread::process_event (this=0x2b197260a010, e=0x2b19790a8790, calling_code=2) at UnixEThread.cc:140 #13 0x0000000000720935 in EThread::execute (this=0x2b197260a010) at UnixEThread.cc:217 #14 0x000000000071ec5a in spawn_thread_internal (a=0x2e72380) at Thread.cc:88 #15 0x00002b196fa279ca in start_thread () from /lib/libpthread.so.0 #16 0x00002b1971c2d70d in clone () from /lib/libc.so.6 #17 0x0000000000000000 in ?? ()
          Hide
          Alan M. Carroll added a comment -

          I don't think this has anything to do with the cache on disk. My reading of the code is that this involves reading from live cache fills. SelectFromAlternates is picking the particular writer (cache fill) for a reader and it's picked on that's invalid (m_alt == NULL). You can see that an effort is made to prevent the selection vector from containing an invalid entry but it's not clear to me from looking at code (primarily CacheVC::openReadChooseWriter) how entries that "time out" are purged. Note that current writers are scanned for validity and removed, but the vector itself is not cleared. Moreover, the vector is a class member so it can contain cruft from previous operations on the class. One thing to get more information would be to put a check at the start of that method that loops over the vector and checks for invalid entries and see if that goes off.

          Show
          Alan M. Carroll added a comment - I don't think this has anything to do with the cache on disk. My reading of the code is that this involves reading from live cache fills. SelectFromAlternates is picking the particular writer (cache fill) for a reader and it's picked on that's invalid (m_alt == NULL). You can see that an effort is made to prevent the selection vector from containing an invalid entry but it's not clear to me from looking at code (primarily CacheVC::openReadChooseWriter) how entries that "time out" are purged. Note that current writers are scanned for validity and removed, but the vector itself is not cleared. Moreover, the vector is a class member so it can contain cruft from previous operations on the class. One thing to get more information would be to put a check at the start of that method that loops over the vector and checks for invalid entries and see if that goes off.
          Hide
          Leif Hedstrom added a comment -

          And, we should probably figure out why m_alt is NULL still, and perhaps invalidate a cache object where that would occur? I mean, even though it's most likely a cache corruption, it'd be nice to be able to deal with it slightly better than a segfault.

          Show
          Leif Hedstrom added a comment - And, we should probably figure out why m_alt is NULL still, and perhaps invalidate a cache object where that would occur? I mean, even though it's most likely a cache corruption, it'd be nice to be able to deal with it slightly better than a segfault.
          Hide
          Leif Hedstrom added a comment -

          Looking at the code, at a minimum we should apply this "optimization":

          diff --git a/proxy/http/HttpTransactCache.cc b/proxy/http/HttpTransactCache.cc
          index e1b9a5f..4fab4f9 100644
          --- a/proxy/http/HttpTransactCache.cc
          +++ b/proxy/http/HttpTransactCache.cc
          @@ -169,7 +169,6 @@ int
           HttpTransactCache::SelectFromAlternates(CacheHTTPInfoVector * cache_vector,
                                                   HTTPHdr * client_request, CacheLookupHttpConfig * http_config_params)
           {
          -  CacheKey zero_key(0, 0);
             time_t current_age, best_age = NUM_SECONDS_IN_ONE_YEAR;
             time_t t_now = 0;
             int best_index = -1;
          
          Show
          Leif Hedstrom added a comment - Looking at the code, at a minimum we should apply this "optimization": diff --git a/proxy/http/HttpTransactCache.cc b/proxy/http/HttpTransactCache.cc index e1b9a5f..4fab4f9 100644 --- a/proxy/http/HttpTransactCache.cc +++ b/proxy/http/HttpTransactCache.cc @@ -169,7 +169,6 @@ int HttpTransactCache::SelectFromAlternates(CacheHTTPInfoVector * cache_vector, HTTPHdr * client_request, CacheLookupHttpConfig * http_config_params) { - CacheKey zero_key(0, 0); time_t current_age, best_age = NUM_SECONDS_IN_ONE_YEAR; time_t t_now = 0; int best_index = -1;
          Hide
          Leif Hedstrom added a comment -

          Another thought could be that the cache got corrupted. Asking if the OP can try to clear the cache.

          Show
          Leif Hedstrom added a comment - Another thought could be that the cache got corrupted. Asking if the OP can try to clear the cache.
          Hide
          Leif Hedstrom added a comment -

          Looking at how this API is used, I'm wondering (but, far, far from certain) if there's a missing test at the beginning, e.g.

          diff --git a/proxy/hdrs/HTTP.h b/proxy/hdrs/HTTP.h
          index c25230c..66a4520 100644
          --- a/proxy/hdrs/HTTP.h
          +++ b/proxy/hdrs/HTTP.h
          @@ -1372,6 +1372,9 @@ HTTPInfo::object_key_get()
           {
             INK_MD5 val;
           
          +  if (NULL == m_alt)
          +    return zero_key;
          +
             ((int32_t *) & val)[0] = m_alt->m_object_key[0];
             ((int32_t *) & val)[1] = m_alt->m_object_key[1];
             ((int32_t *) & val)[2] = m_alt->m_object_key[2];
          

          Now, the real question is, why is m_alt == NULL to begin with?

          Show
          Leif Hedstrom added a comment - Looking at how this API is used, I'm wondering (but, far, far from certain) if there's a missing test at the beginning, e.g. diff --git a/proxy/hdrs/HTTP.h b/proxy/hdrs/HTTP.h index c25230c..66a4520 100644 --- a/proxy/hdrs/HTTP.h +++ b/proxy/hdrs/HTTP.h @@ -1372,6 +1372,9 @@ HTTPInfo::object_key_get() { INK_MD5 val; + if (NULL == m_alt) + return zero_key; + ((int32_t *) & val)[0] = m_alt->m_object_key[0]; ((int32_t *) & val)[1] = m_alt->m_object_key[1]; ((int32_t *) & val)[2] = m_alt->m_object_key[2]; Now, the real question is, why is m_alt == NULL to begin with?

            People

            • Assignee:
              Unassigned
              Reporter:
              Kingsley Foreman
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Development