Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1328

TS crashes in RemoteBootstrapSession::Init()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 0.7.0
    • 0.7.0
    • recovery, tserver
    • None

    Description

      Three nodes on the YCSB cluster crashed within the same minute of one another. The backtrace:

      #0  kudu::tserver::RemoteBootstrapSession::Init (this=0x4e633dc0)
          at /usr/src/debug/kudu-0.7.0-kudu0.7.0/src/kudu/tserver/remote_bootstrap_session.cc:94
      #1  0x00000000007871e8 in kudu::tserver::RemoteBootstrapServiceImpl::BeginRemoteBootstrapSession (this=0x33e4a20, 
          req=Unhandled dwarf expression opcode 0xf3) at /usr/src/debug/kudu-0.7.0-kudu0.7.0/src/kudu/tserver/remote_bootstrap_service.cc:130
      #2  0x00000000007f777a in kudu::tserver::RemoteBootstrapServiceIf::Handle (this=0x33e4a20, call=0xbc07e6c0)
          at /usr/src/debug/kudu-0.7.0-kudu0.7.0/build/release/src/kudu/tserver/remote_bootstrap.service.cc:59
      #3  0x00000000009d87b8 in kudu::rpc::ServicePool::RunThread (this=0x33d8dc0)
          at /usr/src/debug/kudu-0.7.0-kudu0.7.0/src/kudu/rpc/service_pool.cc:174
      #4  0x00000000017a1d1a in operator() (arg=0x3576f70)
          at /opt/toolchain/boost-pic-1.55.0/include/boost/function/function_template.hpp:767
      #5  kudu::Thread::SuperviseThread (arg=0x3576f70) at /usr/src/debug/kudu-0.7.0-kudu0.7.0/src/kudu/util/thread.cc:580
      #6  0x00000030234079d1 in start_thread () from /lib64/libpthread.so.0
      #7  0x00000030230e88fd in clone () from /lib64/libc.so.6
      

      The offending code:

        LOG(INFO) << "T " << tablet_peer_->tablet_id()
                  << " P " << tablet_peer_->consensus()->peer_uuid()
                  << ": Remote bootstrap: Opening " << data_blocks.size() << " blocks";
      

      Specifically, consensus() returns 0x0 so LOG() dereferences a null pointer. From the logging it looks like we're trying to remote bootstrap a tablet that has just been shut down, but on a macro level I don't know how that would happen. This is a regression from commit b841512 which introduced this LOG() statement. Fixing it is easy enough, but I'm going to try and add an integration test that teases out the crash.

      I've filed this as 0.7.0 blocker because I didn't know any better; feel free to kick it to 0.8.0 if you disagree.

      Attachments

        Activity

          People

            adar Adar Dembo
            adar Adar Dembo
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: