Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12189

Improve CallQueueManager#swapQueue to make queue elements drop nearly impossible.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.1
    • 2.8.0, 3.0.0-alpha1
    • ipc, test
    • None

    Description

      Improve CallQueueManager#swapQueue to make queue elements drop nearly impossible. This is the trade-off between performance and functionality, even in the very very rare situation, we may drop one element, but it is not the end of the world since the client may still recover with timeout.
      CallQueueManager may drop elements from the queue sometimes when calling swapQueue.
      The following test failure from TestCallQueueManager shown some elements in the queue are dropped.
      https://builds.apache.org/job/PreCommit-HADOOP-Build/7150/testReport/org.apache.hadoop.ipc/TestCallQueueManager/testSwapUnderContention/

      java.lang.AssertionError: expected:<27241> but was:<27245>
      	at org.junit.Assert.fail(Assert.java:88)
      	at org.junit.Assert.failNotEquals(Assert.java:743)
      	at org.junit.Assert.assertEquals(Assert.java:118)
      	at org.junit.Assert.assertEquals(Assert.java:555)
      	at org.junit.Assert.assertEquals(Assert.java:542)
      	at org.apache.hadoop.ipc.TestCallQueueManager.testSwapUnderContention(TestCallQueueManager.java:220)
      

      It looked like the elements in the queue are dropped due to CallQueueManager#swapQueue
      Looked at the implementation of CallQueueManager#swapQueue, there is a possibility that the elements in the queue are dropped. If the queue is full, the calling thread for CallQueueManager#put is blocked for long time. It may put the element into the old queue after queue in takeRef is changed by swapQueue, then this element in the old queue will be dropped.

      Attachments

        1. HADOOP-12189.000.patch
          6 kB
          Zhihai Xu
        2. HADOOP-12189.001.patch
          7 kB
          Zhihai Xu
        3. HADOOP-12189.none_guarantee.000.patch
          7 kB
          Zhihai Xu
        4. HADOOP-12189.none_guarantee.001.patch
          3 kB
          Zhihai Xu
        5. HADOOP-12189.none_guarantee.002.patch
          3 kB
          Zhihai Xu

        Activity

          People

            zxu Zhihai Xu
            zxu Zhihai Xu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: