Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1362

Ensure master behaves correctly after a sys_catalog write failure

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 0.7.0
    • Fix Version/s: None
    • Component/s: master
    • Labels:
      None
    • Target Version/s:

      Description

      For multi-master usage to truly be safe, we must ensure that a failure to write to the system catalog table is handled correctly. When there's only one master this can only happen in the event of a disk failure or equivalent, but with multiple masters, failures can happen all the time (i.e. failed replicas, network partitions, etc.)

      So far I've only found one case where this is truly broken, in catalog_manager.cc:L2444:

         2433 void CatalogManager::DeleteTabletsAndSendRequests(const scoped_refptr<TableInfo>& table) {
         2434   vector<scoped_refptr<TabletInfo> > tablets;
         2435   table->GetAllTablets(&tablets);
         2436 
         2437   string deletion_msg = "Table deleted at " + LocalTimeAsString();
         2438 
         2439   for (const scoped_refptr<TabletInfo>& tablet : tablets) {
         2440     DeleteTabletReplicas(tablet.get(), deletion_msg);
         2441 
         2442     TabletMetadataLock tablet_lock(tablet.get(), TabletMetadataLock::WRITE);
         2443     tablet_lock.mutable_data()->set_state(SysTabletsEntryPB::DELETED, deletion_msg);
        >2444     CHECK_OK(sys_catalog_->UpdateTablets({ tablet.get() }));
         2445     tablet_lock.Commit();
         2446   }
         2447 }
      

      In this case we should batch up all of the tablet deletions into one UpdateTablets() call, and pass the status up to the DeleteTable caller too.

      Part of the work here is an integration test that provides good coverage for the various failure paths.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adar Adar Dembo
                Reporter:
                adar Adar Dembo
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: