Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1362

Ensure master behaves correctly after a sys_catalog write failure

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 0.7.0
    • None
    • master
    • None

    Description

      For multi-master usage to truly be safe, we must ensure that a failure to write to the system catalog table is handled correctly. When there's only one master this can only happen in the event of a disk failure or equivalent, but with multiple masters, failures can happen all the time (i.e. failed replicas, network partitions, etc.)

      So far I've only found one case where this is truly broken, in catalog_manager.cc:L2444:

         2433 void CatalogManager::DeleteTabletsAndSendRequests(const scoped_refptr<TableInfo>& table) {
         2434   vector<scoped_refptr<TabletInfo> > tablets;
         2435   table->GetAllTablets(&tablets);
         2436 
         2437   string deletion_msg = "Table deleted at " + LocalTimeAsString();
         2438 
         2439   for (const scoped_refptr<TabletInfo>& tablet : tablets) {
         2440     DeleteTabletReplicas(tablet.get(), deletion_msg);
         2441 
         2442     TabletMetadataLock tablet_lock(tablet.get(), TabletMetadataLock::WRITE);
         2443     tablet_lock.mutable_data()->set_state(SysTabletsEntryPB::DELETED, deletion_msg);
        >2444     CHECK_OK(sys_catalog_->UpdateTablets({ tablet.get() }));
         2445     tablet_lock.Commit();
         2446   }
         2447 }
      

      In this case we should batch up all of the tablet deletions into one UpdateTablets() call, and pass the status up to the DeleteTable caller too.

      Part of the work here is an integration test that provides good coverage for the various failure paths.

      Attachments

        Issue Links

          Activity

            People

              adar Adar Dembo
              adar Adar Dembo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: