As shown in
HIVE-16886, notification IDs generated by Hive may be non-unique and there may be cases with different evnts sharing the same ID. This creates various problems for Sentry/Hive interaction and we should fine some short -term solution until it is fixed in Hive.
The issue was addressed in
SENTRY-1803 by removing a primary-key constraint on the notification Id which allows for multiple keys. But this creates other problems:
1. We are using the primary key constraint to prevent multiple instances of Sentry from processing the same notifications multiple times.
2. We are using max(notificationId) to find the last processed event. When the field is a primary key, this operation is an index scan, but when it isn't, it is a full table scan which is more expensive.
We also have a few other problems caused by duplicate IDs which are not related and not addressed by
1. There is a synchronization mechanism between HMS and Sentry which ensures that a given event is processed. This doesn't work in the presence of duplicate IDs.
2. Some events may be missed due to the way they are processed.