Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10628 Fix semantic inconsistency among methods which are exposed to client
  3. HBASE-10636

HBaseAdmin.deleteTable isn't 'really' synchronous in that still some cleanup in HMaster after client thinks deleteTable() succeeds

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Abandoned
    • None
    • None
    • Client, master
    • None

    Description

      In HBaseAdmin.deleteTable():

      public void deleteTable(final TableName tableName) throws IOException {
          // Wait until all regions deleted
          for (int tries = 0; tries < (this.numRetries * this.retryLongerMultiplier); tries++) {
              // let us wait until hbase:meta table is updated and
              // HMaster removes the table from its HTableDescriptors
              if (values == null || values.length == 0) {
                tableExists = false;
                GetTableDescriptorsResponse htds;
                MasterKeepAliveConnection master = connection.getKeepAliveMasterService();
                try {
                  GetTableDescriptorsRequest req =
                    RequestConverter.buildGetTableDescriptorsRequest(tableName);
                  htds = master.getTableDescriptors(null, req);
                } catch (ServiceException se) {
                  throw ProtobufUtil.getRemoteException(se);
                } finally {
                  master.close();
                }
                tableExists = !htds.getTableSchemaList().isEmpty();
                if (!tableExists) {
                  break;
                }
              }
            }
      

      client thinks deleteTable succeeds once it can't retrieve back the tableDescriptor

      But in HMaster, the DeleteTableHandler which really deletes the table:

        protected void handleTableOperation(List<HRegionInfo> regions)
        throws IOException, KeeperException {
          // 1. Wait because of region in transition
          ....
          // 2. Remove regions from META
          LOG.debug("Deleting regions from META");
          MetaEditor.deleteRegions(this.server.getCatalogTracker(), regions);
      
          // 3. Move the table in /hbase/.tmp
          MasterFileSystem mfs = this.masterServices.getMasterFileSystem();
          Path tempTableDir = mfs.moveTableToTemp(tableName);
      
          try {
            // 4. Delete regions from FS (temp directory)
            FileSystem fs = mfs.getFileSystem();
            for (HRegionInfo hri: regions) {
              LOG.debug("Archiving region " + hri.getRegionNameAsString() + " from FS");
              HFileArchiver.archiveRegion(fs, mfs.getRootDir(),
                  tempTableDir, new Path(tempTableDir, hri.getEncodedName()));
            }
      
            // 5. Delete table from FS (temp directory)
            if (!fs.delete(tempTableDir, true)) {
              LOG.error("Couldn't delete " + tempTableDir);
            }
      
            LOG.debug("Table '" + tableName + "' archived!");
          } finally {
            // 6. Update table descriptor cache
            LOG.debug("Removing '" + tableName + "' descriptor.");
            this.masterServices.getTableDescriptors().remove(tableName);
      
            // 7. Clean up regions of the table in RegionStates.
            LOG.debug("Removing '" + tableName + "' from region states.");
            states.tableDeleted(tableName);
      
            // 8. If entry for this table in zk, and up in AssignmentManager, remove it.
            LOG.debug("Marking '" + tableName + "' as deleted.");
            am.getZKTable().setDeletedTable(tableName);
          }
      
          if (cpHost != null) {
            cpHost.postDeleteTableHandler(this.tableName);
          }
        }
      

      Removing regions out of RegionStates, Marking table deleted from ZK, Calling coprocessor's postDeleteTableHandler are all after the table is removed from TableDescriptor cache

      So client code relying on RegionStates/ZKTable/CP being cleaned up after deleteTable() possibly fail, if client requests hit HMaster before those three cleanup are done...

      Actually when I add some sleep such as 200ms after below line to simulate a possible slow-running HMaster

      this.masterServices.getTableDescriptors().remove(tableName);
      

      Some unit tests(such as moveRegion / confirming postDeleteTable CP immediately after deleteTable) can't pass no longer

      Attachments

        Activity

          People

            Unassigned Unassigned
            fenghh Honghua Feng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: