Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2952

TServers reporting replica stats may race with leadership change, hitting a DCHECK

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.11.0
    • consensus, tserver
    • None

    Description

      I have a precommit that failed with:

      F0924 00:08:46.821594  9670 catalog_manager.cc:4239] Check failed: ts_desc->permanent_uuid() == report.consensus_state().leader_uuid() 
      *** Check failure stack trace: ***
          @     0x7f5e442ea62d  google::LogMessage::Fail() at ??:0
          @     0x7f5e442ec64c  google::LogMessage::SendToLog() at ??:0
          @     0x7f5e442ea189  google::LogMessage::Flush() at ??:0
          @     0x7f5e442ecfdf  google::LogMessageFatal::~LogMessageFatal() at ??:0
          @     0x7f5e45d89a01  kudu::master::CatalogManager::ProcessTabletReport() at ??:0
          @     0x7f5e45e29ae7  kudu::master::MasterServiceImpl::TSHeartbeat() at ??:0
          @     0x7f5e41f29cbc  _ZZN4kudu6master15MasterServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE0_clESG_SH_SJ_ at ??:0
          @     0x7f5e41f3068b  _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_6master15MasterServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E0_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ at ??:0
          @     0x7f5e3fea909e  std::function<>::operator()() at ??:0
          @     0x7f5e3fea88cf  kudu::rpc::GeneratedServiceIf::Handle() at ??:0
          @     0x7f5e3feab3b6  kudu::rpc::ServicePool::RunThread() at ??:0
          @     0x7f5e3feac785  boost::_mfi::mf0<>::operator()() at ??:0
          @     0x7f5e3feac5ac  boost::_bi::list1<>::operator()<>() at ??:0
          @     0x7f5e3feac493  boost::_bi::bind_t<>::operator()() at ??:0
          @     0x7f5e3feac3c2  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
          @     0x7f5e44db28d2  boost::function0<>::operator()() at ??:0
          @     0x7f5e44daf65b  kudu::Thread::SuperviseThread() at ??:0
          @     0x7f5e41429184  start_thread at ??:0
          @     0x7f5e438f4ffd  clone at ??:0 
      
      

      Looking through the code, it looks like there's a kind of TOCTOU race going on when generating reports.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            awong Andrew Wong
            awong Andrew Wong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment