Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16886

HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      When running multiple Hive Metastore servers and DB notifications are enabled, I could see that notifications can be persisted with a duplicated event ID.

      This does not happen when running multiple threads in a single HMS node due to the locking acquired on the DbNotificationsLog class, but multiple HMS could cause conflicts.

      The issue is in the ObjectStore#addNotificationEvent() method. The event ID fetched from the datastore is used for the new notification, incremented in the server itself, then persisted or updated back to the datastore. If 2 servers read the same ID, then these 2 servers write a new notification with the same ID.

      The event ID is not unique nor a primary key.

      Here's a test case using the TestObjectStore class that confirms this issue:

      @Test
        public void testConcurrentAddNotifications() throws ExecutionException, InterruptedException {
          final int NUM_THREADS = 2;
          CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
          CountDownLatch countOut = new CountDownLatch(1);
      
          HiveConf conf = new HiveConf();
          conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, MockPartitionExpressionProxy.class.getName());
      
          ExecutorService executorService = Executors.newFixedThreadPool(NUM_THREADS);
          FutureTask<Void> tasks[] = new FutureTask[NUM_THREADS];
          for (int i=0; i<NUM_THREADS; i++) {
            final int n = i;
      
            tasks[i] = new FutureTask<Void>(new Callable<Void>() {
              @Override
              public Void call() throws Exception {
                ObjectStore store = new ObjectStore();
                store.setConf(conf);
      
                NotificationEvent dbEvent =
                    new NotificationEvent(0, 0, EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
      
                System.out.println("ADDING NOTIFICATION");
                countIn.countDown();
                countOut.await();
                store.addNotificationEvent(dbEvent);
                System.out.println("FINISH NOTIFICATION");
      
                return null;
              }
            });
      
            executorService.execute(tasks[i]);
          }
      
          countIn.await();
          countOut.countDown();
      
          for (int i = 0; i < NUM_THREADS; ++i) {
            tasks[i].get();
          }
      
          NotificationEventResponse eventResponse = objectStore.getNextNotification(new NotificationEventRequest());
          Assert.assertEquals(2, eventResponse.getEventsSize());
          Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
      
          // This fails because the next notification has an event ID = 1
          Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
        }
      

      The last assertion fails expecting an event ID 1 instead of 2.

      Attachments

        1. datastore-identity-holes.diff
          7 kB
          Sergio Peña
        2. HIVE-16886.1.patch
          40 kB
          Anishek Agarwal
        3. HIVE-16886.2.patch
          54 kB
          Anishek Agarwal
        4. HIVE-16886.3.patch
          58 kB
          Anishek Agarwal
        5. HIVE-16886.4.patch
          58 kB
          Anishek Agarwal
        6. HIVE-16886.5.patch
          60 kB
          Anishek Agarwal
        7. HIVE-16886.6.patch
          60 kB
          Anishek Agarwal
        8. HIVE-16886.7.patch
          60 kB
          Anishek Agarwal
        9. HIVE-16886.8.patch
          60 kB
          Anishek Agarwal

        Issue Links

          Activity

            People

              anishek Anishek Agarwal
              spena Sergio Peña
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: