Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2952

TServers reporting replica stats may race with leadership change, hitting a DCHECK

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.11.0
    • Component/s: consensus, tserver
    • Labels:
      None

      Description

      I have a precommit that failed with:

      F0924 00:08:46.821594  9670 catalog_manager.cc:4239] Check failed: ts_desc->permanent_uuid() == report.consensus_state().leader_uuid() 
      *** Check failure stack trace: ***
          @     0x7f5e442ea62d  google::LogMessage::Fail() at ??:0
          @     0x7f5e442ec64c  google::LogMessage::SendToLog() at ??:0
          @     0x7f5e442ea189  google::LogMessage::Flush() at ??:0
          @     0x7f5e442ecfdf  google::LogMessageFatal::~LogMessageFatal() at ??:0
          @     0x7f5e45d89a01  kudu::master::CatalogManager::ProcessTabletReport() at ??:0
          @     0x7f5e45e29ae7  kudu::master::MasterServiceImpl::TSHeartbeat() at ??:0
          @     0x7f5e41f29cbc  _ZZN4kudu6master15MasterServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE0_clESG_SH_SJ_ at ??:0
          @     0x7f5e41f3068b  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E0_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
          @     0x7f5e3fea909e  std::function<>::operator()() at ??:0
          @     0x7f5e3fea88cf  kudu::rpc::GeneratedServiceIf::Handle() at ??:0
          @     0x7f5e3feab3b6  kudu::rpc::ServicePool::RunThread() at ??:0
          @     0x7f5e3feac785  boost::_mfi::mf0<>::operator()() at ??:0
          @     0x7f5e3feac5ac  boost::_bi::list1<>::operator()<>() at ??:0
          @     0x7f5e3feac493  boost::_bi::bind_t<>::operator()() at ??:0
          @     0x7f5e3feac3c2  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
          @     0x7f5e44db28d2  boost::function0<>::operator()() at ??:0
          @     0x7f5e44daf65b  kudu::Thread::SuperviseThread() at ??:0
          @     0x7f5e41429184  start_thread at ??:0
          @     0x7f5e438f4ffd  clone at ??:0 
      
      

      Looking through the code, it looks like there's a kind of TOCTOU race going on when generating reports.

        Attachments

        1. master_hms-itest.txt
          712 kB
          Andrew Wong

          Activity

            People

            • Assignee:
              awong Andrew Wong
              Reporter:
              awong Andrew Wong
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: