Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-4160

MLocRelease for txn buffers called too late in CPP API resulting buffer corruption

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 6.1.0
    • 6.2.0
    • CPP API

    Description

      CPP API currently acquires the buffer and heap handles for various kinds of request/response objects (e.g cached_request_hdr_buf, cached_request_hdr_loc, client_response_hdr_buf, client_response_hdr_loc etc) at the corresponding hooks of a transaction, along with the url handles in the CPP API's request/response objects, but keeps them until txn_close, before calling MLocRelease on them finally.

      However, this is buggy, since the core doesn't seem to guarantee the life of these handles until the end of the transaction. In particular, the cached_request_hdr_buf/loc are released at the end of tunnel_handler_cache_read(), which causes it to be reassigned to a different txn and results in a crash when the original txn finally tries to release them triggering the asserts from the MLocRelease C API. Another case is internal redirect follow in the core, which destroys and recreates some of these buffers.

      Note that the C API plugins typically call MLocRelease immediately after acquiring/using them.

      Additionally, the order in which CPP API releases the handles is incorrect. Transaction object contains the Request objects and Transaction object's destructor first calls MLocRelease on the request heap handles, before which the Request object's d'tor is triggered which calls the url handle in that released request handle. This may not be a problem right now, since MLocRelease for HTTP_OBJ and URL_OBJ are basically no-ops, but, the order of release still breaks the implicit contract of the API, which requires the child to be released before releasing the parent (which means, if the API's internal implementation changes tomorrow, that can cause a problem).

      Here are the back traces for the issues mentioned above:

      (gdb) bt
      #0  0x0000003c1b232625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
      #1  0x0000003c1b233e05 in abort () at abort.c:92
      #2  0x00002b149d517018 in ink_die_die_die (retval=1) at ink_error.cc:43
      #3  0x00002b149d5170e5 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b149d525d78 "%s:%d: failed assert `%s`", ap=0x2b14a4806570) at ink_error.cc:65
      #4  0x00002b149d5171ae in ink_fatal (return_code=1, message_format=0x2b149d525d78 "%s:%d: failed assert `%s`") at ink_error.cc:73
      #5  0x00002b149d515d20 in _ink_assert (expression=0x771185 "!\"invalid mloc\"", file=0x770f1d "InkAPI.cc", line=1886) at ink_assert.cc:37
      #6  0x000000000050f809 in TSHandleMLocRelease (bufp=0x2b15e82357f0, parent=0x2b15e40fca98, mloc=0x2b15e40fcd18) at InkAPI.cc:1886
      #7  0x00002b15759066fc in atscppapi::Request::~Request (this=0x2b15e0003318, __in_chrg=<value optimized out>) at Request.cc:179
      #8  0x00002b1575902131 in ~TransactionState (this=0x2b15e0001320, __in_chrg=<value optimized out>) at Transaction.cc:40
      #9  atscppapi::Transaction::~Transaction (this=0x2b15e0001320, __in_chrg=<value optimized out>) at Transaction.cc:111
      #10 0x00002b1575901031 in (anonymous namespace)::handleTransactionEvents (cont=<value optimized out>, event=<value optimized out>, edata=0x2b15e8235460) at utils_internal.cc:87
      #11 0x000000000050d7f0 in INKContInternal::handle_event (this=0x20399b0, event=60012, edata=0x2b15e8235460) at InkAPI.cc:1000
      #12 0x00000000004f8dee in Continuation::handleEvent (this=0x20399b0, event=60012, data=0x2b15e8235460) at ../iocore/eventsystem/I_Continuation.h:146
      #13 0x000000000050e037 in APIHook::invoke (this=0x202b170, event=60012, edata=0x2b15e8235460) at InkAPI.cc:1219
      #14 0x00000000005cfb5d in HttpSM::state_api_callout (this=0x2b15e8235460, event=0, data=0x0) at HttpSM.cc:1384
      #15 0x00000000005dba75 in HttpSM::do_api_callout_internal (this=0x2b15e8235460) at HttpSM.cc:4896
      #16 0x00000000005e8b2a in HttpSM::do_api_callout (this=0x2b15e8235460) at HttpSM.cc:437
      #17 0x00000000005e0ed0 in HttpSM::kill_this (this=0x2b15e8235460) at HttpSM.cc:6615
      #18 0x00000000005d358f in HttpSM::main_handler (this=0x2b15e8235460, event=2301, data=0x2b15e82360a0) at HttpSM.cc:2556
      #19 0x00000000004f8dee in Continuation::handleEvent (this=0x2b15e8235460, event=2301, data=0x2b15e82360a0) at ../iocore/eventsystem/I_Continuation.h:146
      #20 0x000000000061cebf in HttpTunnel::main_handler (this=0x2b15e82360a0, event=103, data=0x2b15fc01e2e8) at HttpTunnel.cc:1537
      #21 0x00000000004f8dee in Continuation::handleEvent (this=0x2b15e82360a0, event=103, data=0x2b15fc01e2e8) at ../iocore/eventsystem/I_Continuation.h:146
      #22 0x000000000070eafa in CacheVC::calluser (this=0x2b15fc01e0a0, event=103) at P_CacheInternal.h:642
      #23 0x00000000007174fb in CacheVC::openWriteMain (this=0x2b15fc01e0a0) at CacheWrite.cc:1367
      #24 0x00000000004f8dee in Continuation::handleEvent (this=0x2b15fc01e0a0, event=1, data=0x2b15e4041090) at ../iocore/eventsystem/I_Continuation.h:146
      #25 0x0000000000760bee in EThread::process_event (this=0x2b149ef4a010, e=0x2b15e4041090, calling_code=1) at UnixEThread.cc:145
      #26 0x0000000000760dbc in EThread::execute (this=0x2b149ef4a010) at UnixEThread.cc:196
      #27 0x0000000000760160 in spawn_thread_internal (a=0x1b4c790) at Thread.cc:88
      #28 0x00002b149dded9d1 in start_thread (arg=0x0) at pthread_create.c:301
      #29 0x0000003c1b2e88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
      
      
      (gdb) bt
      #0  0x0000003c1b232625 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
      #1  0x0000003c1b233e05 in abort () at abort.c:92
      #2  0x00002b0af272e018 in ink_die_die_die (retval=1) at ink_error.cc:43
      #3  0x00002b0af272e0e5 in ink_fatal_va(int, const char *, typedef __va_list_tag __va_list_tag *) (return_code=1, message_format=0x2b0af273cd78 "%s:%d: failed assert `%s`", ap=0x2b0af9d1b420) at ink_error.cc:65
      #4  0x00002b0af272e1ae in ink_fatal (return_code=1, message_format=0x2b0af273cd78 "%s:%d: failed assert `%s`") at ink_error.cc:73
      #5  0x00002b0af272cd20 in _ink_assert (expression=0x771158 "sdk_sanity_check_mbuffer(bufp) == TS_SUCCESS", file=0x770f1d "InkAPI.cc", line=1869) at ink_assert.cc:37
      #6  0x000000000050c948 in _TSReleaseAssert (text=0x771158 "sdk_sanity_check_mbuffer(bufp) == TS_SUCCESS", file=0x770f1d "InkAPI.cc", line=1869) at InkAPI.cc:405
      #7  0x000000000050f78e in TSHandleMLocRelease (bufp=0x2b0c4400d068, parent=0x0, mloc=0x2b0c942811c8) at InkAPI.cc:1869
      #8  0x00002b0bc98bf091 in atscppapi::Transaction::~Transaction (this=0x2b0c64059580, __in_chrg=<value optimized out>) at Transaction.cc:105
      #9  0x00002b0bc98be031 in (anonymous namespace)::handleTransactionEvents (cont=<value optimized out>, event=<value optimized out>, edata=0x2b0c401c7480) at utils_internal.cc:87
      #10 0x000000000050d7f0 in INKContInternal::handle_event (this=0x1ba23c0, event=60012, edata=0x2b0c401c7480) at InkAPI.cc:1000
      #11 0x00000000004f8dee in Continuation::handleEvent (this=0x1ba23c0, event=60012, data=0x2b0c401c7480) at ../iocore/eventsystem/I_Continuation.h:146
      #12 0x000000000050e037 in APIHook::invoke (this=0x1b93b80, event=60012, edata=0x2b0c401c7480) at InkAPI.cc:1219
      #13 0x00000000005cfb5d in HttpSM::state_api_callout (this=0x2b0c401c7480, event=0, data=0x0) at HttpSM.cc:1384
      #14 0x00000000005dba75 in HttpSM::do_api_callout_internal (this=0x2b0c401c7480) at HttpSM.cc:4896
      #15 0x00000000005e8b2a in HttpSM::do_api_callout (this=0x2b0c401c7480) at HttpSM.cc:437
      #16 0x00000000005e0ed0 in HttpSM::kill_this (this=0x2b0c401c7480) at HttpSM.cc:6615
      #17 0x00000000005d358f in HttpSM::main_handler (this=0x2b0c401c7480, event=2301, data=0x2b0c401c80c0) at HttpSM.cc:2556
      #18 0x00000000004f8dee in Continuation::handleEvent (this=0x2b0c401c7480, event=2301, data=0x2b0c401c80c0) at ../iocore/eventsystem/I_Continuation.h:146
      #19 0x000000000061cebf in HttpTunnel::main_handler (this=0x2b0c401c80c0, event=103, data=0x2b0c2800b898) at HttpTunnel.cc:1537
      #20 0x00000000004f8dee in Continuation::handleEvent (this=0x2b0c401c80c0, event=103, data=0x2b0c2800b898) at ../iocore/eventsystem/I_Continuation.h:146
      #21 0x000000000073f958 in write_signal_and_update (event=103, vc=0x2b0c2800b720) at UnixNetVConnection.cc:174
      #22 0x000000000073fb3e in write_signal_done (event=103, nh=0x2b0af8508cb0, vc=0x2b0c2800b720) at UnixNetVConnection.cc:216
      #23 0x0000000000740d12 in write_to_net_io (nh=0x2b0af8508cb0, vc=0x2b0c2800b720, thread=0x2b0af8505010) at UnixNetVConnection.cc:559
      #24 0x0000000000740579 in write_to_net (nh=0x2b0af8508cb0, vc=0x2b0c2800b720, thread=0x2b0af8505010) at UnixNetVConnection.cc:414
      #25 0x0000000000739536 in NetHandler::mainNetEvent (this=0x2b0af8508cb0, event=5, e=0x1701b70) at UnixNet.cc:534
      #26 0x00000000004f8dee in Continuation::handleEvent (this=0x2b0af8508cb0, event=5, data=0x1701b70) at ../iocore/eventsystem/I_Continuation.h:146
      #27 0x0000000000760bee in EThread::process_event (this=0x2b0af8505010, e=0x1701b70, calling_code=5) at UnixEThread.cc:145
      #28 0x00000000007610f8 in EThread::execute (this=0x2b0af8505010) at UnixEThread.cc:269
      #29 0x0000000000760160 in spawn_thread_internal (a=0x162e500) at Thread.cc:88
      #30 0x00002b0af30049d1 in start_thread (arg=0x2b0af9d1c700) at pthread_create.c:301
      #31 0x0000003c1b2e88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
      

      Below are debug logs from load tests that prove the problem:

      Step 1: Transaction 0x2b8430214d70 allocates cached_request_hdr_loc=0x2b846c026a58
      
      [Jan 28 18:28:19.652] Server {0x2b82ecc0b700} DIAG: (atscppapi.utils_internal.cc:176) [getTransaction()] existing transaction object at 0x2b8444014450 for ats pointer 0x2b8430214d70, event 60005
      [Jan 28 18:28:19.652] Server {0x2b82ecc0b700} DIAG: (atscppapi.utils_internal.cc:53) [handleTransactionEvents()] Got event 60005 on continuation 0x2d58950 for transaction (ats pointer 0x2b8430214d70, object 0x2b8444014450)
      [Jan 28 18:28:19.652] Server {0x2b82ecc0b700} DEBUG: (atscppapi) TSHttpTxnCachedReqGet txnp 0x2b8430214d70, new handle bufp 0x2b8434020878, heap 0x2b846c0269d0, obj 0x2b846c026a58, magic 0xabcdfeed
      [Jan 28 18:28:19.652] Server {0x2b82ecc0b700} DIAG: (atscppapi.Transaction.cc:494) [initCachedRequest()] Initializing cached request, txn=0x2b8430214d70, state_=0x2b8444002130, client_request_hdr_buf_=0x2b8430215080, client_request_hdr_loc_=0x2b8468263f18, client_response_hdr_buf_=(nil), client_response_hdr_loc_=(nil), server_request_hdr_buf_=(nil), server_request_hdr_loc_=(nil), server_response_hdr_buf_=(nil), server_response_hdr_loc_=(nil), cached_request_hdr_buf_=0x2b8434020878, cached_request_hdr_loc_=0x2b846c026a58
      
      
      Step 2: Transaction 0x2b84301f8a30 allocates the same as client_response_hdr_buf_=0x2b84301f8d80, while it's handle is still in the previous transaction 0x2b8430214d70 in CPP API 
      
      [Jan 28 18:28:19.667] Server {0x2b82e7fdd700} DIAG: (atscppapi.utils_internal.cc:53) [handleTransactionEvents()] Got event 60007 on continuation 0x2d58950 for transaction (ats pointer 0x2b84301f8a30, object 0x2b830c00a750)
      [Jan 28 18:28:19.667] Server {0x2b82e7fdd700} DIAG: (atscppapi.Transaction.cc:485) [initClientResponse()] Initializing client response, txn=0x2b84301f8a30, state_=0x2b830c0017d0, client_request_hdr_buf_=0x2b84301f8d40, client_request_hdr_loc_=0x2b84641a73e8, client_response_hdr_buf_=0x2b84301f8d80, client_response_hdr_loc_=0x2b846c026a58, server_request_hdr_buf_=(nil), server_request_hdr_loc_=(nil), server_response_hdr_buf_=(nil), server_response_hdr_loc_=(nil), cached_request_hdr_buf_=0x2b8434010478, cached_request_hdr_loc_=0x2b84181666e8
      
      Step 3: Transaction 0x2b8430214d70 now gets TXN_CLOSE event and tries to call MLocRelease on 0x2b846c026a58's url handle which results in an assert failure in MLocRelease API
      
      [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: (atscppapi.utils_internal.cc:53) [handleTransactionEvents()] Got event 60012 on continuation 0x2d58950 for transaction (ats pointer 0x2b8430214d70, object 0x2b8444014450)
      [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: (atscppapi.Transaction.cc:88) [~Transaction()] Transaction tshttptxn=0x2b8430214d70 destroying Transaction object 0x2b8444014450, state_=0x2b8444002130, state_->client_request_hdr_buf_=0x2b8430215080, state_->client_request_hdr_loc_=0x2b8468263f18, state_->server_request_hdr_buf_=(nil), state_->server_response_hdr_buf_=(nil), state_->client_response_hdr_buf_=0x2b84302150c0, state_->cached_request_hdr_buf_=0x2b8434020878, state_->cached_response_hdr_buf_=0x2b8434020880
      [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DEBUG: (atscppapi) TSHandleMLocRelease bufp=0x2b8430215080, parent=(nil), mloc=0x2b8468263f18, obj type 3
      [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: (atscppapi.Transaction.cc:100) [~Transaction()] Releasing client response
      [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: (atscppapi.Transaction.cc:104) [~Transaction()] Releasing cached request
      [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: (atscppapi.Transaction.cc:108) [~Transaction()] Releasing cached response
      [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DIAG: (atscppapi.Request.cc:181) [~Request()] Destroying request object with no destroy_buf 0x2b84440021d8 on hdr_buf=0x2b8434020878, hdr_loc=0x2b846c026a58, url_loc=0x2b846c026cd8
      [Jan 28 18:28:19.678] Server {0x2b82ecc0b700} DEBUG: (atscppapi) TSHandleMLocRelease bufp=0x2b8434020878, parent=0x2b846c026a58, mloc=0x2b846c026cd8, obj type 5
      

      Attachments

        Issue Links

          Activity

            People

              sudheerv Sudheer Vinukonda
              sudheerv Sudheer Vinukonda
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Slack

                  Issue deployment