Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
Came across a scenario where HMS database was lost and restored from an old backup which reset the latest event ID (monotonically increasing integer) in HMS to a lower value than what it should be.
Since Kudu master has a last seen event ID greater than the one in HMS currently, it could not process any new events generated. For example, Kudu table deletion was not happening as the Kudu master expects an event ID which is higher than the one it has last seen but the event ID in HMS for the table deletion is less than the one in the Kudu master.
This also causes discrepancy between the metadata in HMS and Kudu masters. It would be better if the Kudu master upon startup does the comparison of the last seen event ID and latest event ID in HMS and crash if the one in HMS is lower with a helpful message/clarifying question like:
Found the last seen event ID in the local Kudu master to be greater than the latest event ID in HMS. Was there any backup or restore done on HMS recently?