Description
As shown in HIVE-16886, notification IDs generated by Hive may be non-unique and there may be cases with different evnts sharing the same ID. This creates various problems for Sentry/Hive interaction and we should fine some short -term solution until it is fixed in Hive.
The issue was addressed in SENTRY-1803 by removing a primary-key constraint on the notification Id which allows for multiple keys. But this creates other problems:
1. We are using the primary key constraint to prevent multiple instances of Sentry from processing the same notifications multiple times.
2. We are using max(notificationId) to find the last processed event. When the field is a primary key, this operation is an index scan, but when it isn't, it is a full table scan which is more expensive.
We also have a few other problems caused by duplicate IDs which are not related and not addressed by SENTRY-1803:
1. There is a synchronization mechanism between HMS and Sentry which ensures that a given event is processed. This doesn't work in the presence of duplicate IDs.
2. Some events may be missed due to the way they are processed.
Attachments
Attachments
Issue Links
- relates to
-
SENTRY-1803 HMSFollower should handle the case of multiple notifications with the same ID
- Resolved
-
SENTRY-1888 Sentry might not fetch all HMS duplicated events IDs when requested
- Resolved
-
HIVE-16738 Notification ID generation in DBNotification might not be unique across HS2 instances.
- Closed
-
HIVE-16886 HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
- Closed
- links to