Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2635

Tserver crash because some orphaned blocks are still listed when deleting metadata

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.0
    • Fix Version/s: 1.11.0
    • Component/s: fs, tablet, tserver
    • Labels:
      None

      Description

      In some cases, upon deleting a tablet, a tablet server may fail to delete some blocks, and then fail to delete the tablet metadata, leading to a crash since failure to delete metadata is a fatal error. That's what happened in the below logs, but it's unclear why the blocks failed to be deleted, and why the server stayed up for a couple minutes after before receiving a delete tablet request, and ultimately crashing. Following the crash, the server was able to start up successfully.

       

      I1130 00:00:07.565915 29721 tablet_service.cc:795] Processing DeleteTablet for tablet 1db7aa7e81474907ace3d493c24cdc94 with delete_type TABLET_DATA_DELETED (Partition dropped at 2018-11-30 00:00:07 PST) from {username='kudu'} at 10.93.87.15:47194
      I1130 00:00:07.565929 29721 tablet_replica.cc:262] T 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: stopping tablet replica
      I1130 00:00:07.565954 29721 maintenance_manager.cc:235] P 97235196a93b41c29954ed8534aa2ddc: Unregistered op CompactRowSetsOp(1db7aa7e81474907ace3d493c24cdc94)
      I1130 00:00:07.565997 29721 maintenance_manager.cc:235] P 97235196a93b41c29954ed8534aa2ddc: Unregistered op MinorDeltaCompactionOp(1db7aa7e81474907ace3d493c24cdc94)
      I1130 00:00:07.566010 29721 maintenance_manager.cc:235] P 97235196a93b41c29954ed8534aa2ddc: Unregistered op MajorDeltaCompactionOp(1db7aa7e81474907ace3d493c24cdc94)
      I1130 00:00:07.566020 29721 maintenance_manager.cc:235] P 97235196a93b41c29954ed8534aa2ddc: Unregistered op UndoDeltaBlockGCOp(1db7aa7e81474907ace3d493c24cdc94)
      I1130 00:00:07.566032 29721 maintenance_manager.cc:235] P 97235196a93b41c29954ed8534aa2ddc: Unregistered op FlushMRSOp(1db7aa7e81474907ace3d493c24cdc94)
      I1130 00:00:07.566040 29721 maintenance_manager.cc:235] P 97235196a93b41c29954ed8534aa2ddc: Unregistered op FlushDeltaMemStoresOp(1db7aa7e81474907ace3d493c24cdc94)
      I1130 00:00:07.566048 29721 maintenance_manager.cc:235] P 97235196a93b41c29954ed8534aa2ddc: Unregistered op LogGCOp(1db7aa7e81474907ace3d493c24cdc94)
      I1130 00:00:07.566056 29721 raft_consensus.cc:2012] T 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc [term 3 FOLLOWER]: Raft consensus shutting down.
      I1130 00:00:07.566074 29721 raft_consensus.cc:2039] T 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc [term 3 FOLLOWER]: Raft consensus is shut down!
      I1130 00:00:07.666061 29721 ts_tablet_manager.cc:1277] T 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: Deleting tablet data with delete state TABLET_DATA_DELETED
      I1130 00:00:08.102607 29721 ts_tablet_manager.cc:1290] T 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: tablet deleted with delete type TABLET_DATA_DELETED: last-logged OpId 3.1166195
      I1130 00:00:08.102629 29721 log.cc:981] T 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: Deleting WAL directory at /home/kudu/tablet/wal/wals/1db7aa7e81474907ace3d493c24cdc94
      I1130 00:00:08.103217 29721 ts_tablet_manager.cc:1310] T 1db7aa7e81474907ace3d493c24cdc94 P 97235196a93b41c29954ed8534aa2ddc: Deleting consensus metadata
      F1130 00:00:08.155643 29721 ts_tablet_manager.cc:848] Failed to delete tablet data for 1db7aa7e81474907ace3d493c24cdc94: Invalid argument: Unable to delete on-disk data from tablet 1db7aa7e81474907ace3d493c24cdc94: The metadata for tablet 1db7aa7e81474907ace3d493c24cdc94 still references orphaned blocks. Call DeleteTabletData() first
      I1130 00:02:09.460352 29725 tablet_service.cc:795] Processing DeleteTablet for tablet 1db7aa7e81474907ace3d493c24cdc94 with delete_type TABLET_DATA_DELETED (Partition dropped at 2018-11-30 00:00:07 PST) from {username='kudu'} at 10.93.87.15:47194

        Attachments

          Activity

            People

            • Assignee:
              andrew.wong Andrew Wong
              Reporter:
              andrew.wong Andrew Wong
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: