Hadoop Common
  1. Hadoop Common
  2. HADOOP-9036

TestSinkQueue.testConcurrentConsumers fails intermittently (Backports HADOOP-7292)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.2.0, 1-win
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      org.apache.hadoop.metrics2.impl.TestSinkQueue.testConcurrentConsumers

      Error Message

      should've thrown
      Stacktrace

      junit.framework.AssertionFailedError: should've thrown
      at org.apache.hadoop.metrics2.impl.TestSinkQueue.shouldThrowCME(TestSinkQueue.java:229)
      at org.apache.hadoop.metrics2.impl.TestSinkQueue.testConcurrentConsumers(TestSinkQueue.java:195)
      Standard Output

      2012-10-03 16:51:31,694 INFO impl.TestSinkQueue (TestSinkQueue.java:consume(243)) - sleeping

      1. HADOOP-9036.patch
        2 kB
        Suresh Srinivas
      2. HADOOP-9036.patch
        2 kB
        Suresh Srinivas
      3. HADOOP-9036.patch
        3 kB
        Suresh Srinivas
      4. HADOOP-9036.branch-1-win.testsinkqueue.patch
        0.7 kB
        Ivan Mitic

        Issue Links

          Activity

          Hide
          Ivan Mitic added a comment -

          There is a race condition in TestSinkQueue#testConcurrentConsumers between a new thread created in newSleepingConsumerQueue and the first assert shouldThrowCME, hence the test fails intermittently. Each time shouldThrowCME runs before the new thread receives a consume callback the test will fail.

          The easiest way to consistently repro the problem is to add a short Thread.sleep in the newSleepingConsumerQueue thread.

          Show
          Ivan Mitic added a comment - There is a race condition in TestSinkQueue#testConcurrentConsumers between a new thread created in newSleepingConsumerQueue and the first assert shouldThrowCME, hence the test fails intermittently. Each time shouldThrowCME runs before the new thread receives a consume callback the test will fail. The easiest way to consistently repro the problem is to add a short Thread.sleep in the newSleepingConsumerQueue thread.
          Hide
          Ivan Mitic added a comment -

          Attaching the patch.

          Show
          Ivan Mitic added a comment - Attaching the patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12553425/HADOOP-9036.branch-1-win.testsinkqueue.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1737//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12553425/HADOOP-9036.branch-1-win.testsinkqueue.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1737//console This message is automatically generated.
          Hide
          Chuan Liu added a comment -

          +1

          Is 0.1 second enough?

          Show
          Chuan Liu added a comment - +1 Is 0.1 second enough?
          Hide
          Ivan Mitic added a comment -

          Is 0.1 second enough?

          Thanks for reviewing Chuan. It should be enough.

          Show
          Ivan Mitic added a comment - Is 0.1 second enough? Thanks for reviewing Chuan. It should be enough.
          Hide
          Suresh Srinivas added a comment -

          I feel this kind of sleep based approach in tests causes them to be brittle and prone to intermittent failures. I will post a comment back when I get a chance to see if this issues can be solved without relying on sleep.

          Show
          Suresh Srinivas added a comment - I feel this kind of sleep based approach in tests causes them to be brittle and prone to intermittent failures. I will post a comment back when I get a chance to see if this issues can be solved without relying on sleep.
          Hide
          Ivan Mitic added a comment -

          I feel this kind of sleep based approach in tests causes them to be brittle and prone to intermittent failures. I will post a comment back when I get a chance to see if this issues can be solved without relying on sleep.

          Thanks Suresh for the comment. I went with a sleep because it was the simplest fix. I'll also take a look for a possible correct solution.

          Show
          Ivan Mitic added a comment - I feel this kind of sleep based approach in tests causes them to be brittle and prone to intermittent failures. I will post a comment back when I get a chance to see if this issues can be solved without relying on sleep. Thanks Suresh for the comment. I went with a sleep because it was the simplest fix. I'll also take a look for a possible correct solution.
          Hide
          Ivan Mitic added a comment -

          It is not obvious how to solve this differently given the test scenario. Just to clarify, this test failed for me approx 1 in 5 times without the patch. With the patch it did not fail after many runs. I understand your concerns, just wanted to clarify on the fix (mitigation).

          Show
          Ivan Mitic added a comment - It is not obvious how to solve this differently given the test scenario. Just to clarify, this test failed for me approx 1 in 5 times without the patch. With the patch it did not fail after many runs. I understand your concerns, just wanted to clarify on the fix (mitigation).
          Hide
          Suresh Srinivas added a comment -

          Ivan, how about a change like this?

          Show
          Suresh Srinivas added a comment - Ivan, how about a change like this?
          Hide
          Suresh Srinivas added a comment -

          Also we should make this change in branch-1 and bring it into branch-1-win.

          Show
          Suresh Srinivas added a comment - Also we should make this change in branch-1 and bring it into branch-1-win.
          Hide
          Ivan Mitic added a comment -

          Thanks Suresh. We still have the race condition though, right? Between Thread.Sleep and shouldThrowCME? I personally like the previous mitigation better since it's a simpler fix.

          Show
          Ivan Mitic added a comment - Thanks Suresh. We still have the race condition though, right? Between Thread.Sleep and shouldThrowCME ? I personally like the previous mitigation better since it's a simpler fix.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12553669/HADOOP-9036.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1752//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12553669/HADOOP-9036.patch against trunk revision . -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1752//console This message is automatically generated.
          Hide
          Suresh Srinivas added a comment -

          We still have the race condition though, right? Between Thread.Sleep and shouldThrowCME?

          May be I do not understand the problem correctly. I thought if the thread where Consumer is running starts before the rest of the shouldThrowCME() calls, things work fine. That is what the patch I attached is doing.

          Show
          Suresh Srinivas added a comment - We still have the race condition though, right? Between Thread.Sleep and shouldThrowCME? May be I do not understand the problem correctly. I thought if the thread where Consumer is running starts before the rest of the shouldThrowCME() calls, things work fine. That is what the patch I attached is doing.
          Hide
          Ivan Mitic added a comment -

          You're right, didn't look into the patch deeply enough. Thanks. Patch looks good, +1

          Show
          Ivan Mitic added a comment - You're right, didn't look into the patch deeply enough. Thanks. Patch looks good, +1
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I think this is the same as HADOOP-7292. How about simply merging it?

          Show
          Tsz Wo Nicholas Sze added a comment - I think this is the same as HADOOP-7292 . How about simply merging it?
          Hide
          Suresh Srinivas added a comment -

          agreed. Will fix this as duplicate of HADOOP-7292.

          Show
          Suresh Srinivas added a comment - agreed. Will fix this as duplicate of HADOOP-7292 .
          Hide
          Suresh Srinivas added a comment -

          Backported HADOOP-7292 patch.

          Show
          Suresh Srinivas added a comment - Backported HADOOP-7292 patch.
          Hide
          Suresh Srinivas added a comment -

          Now the real updated patch.

          Show
          Suresh Srinivas added a comment - Now the real updated patch.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
          Hide
          Suresh Srinivas added a comment -

          Committed the patch to branch-1 and branch-1-win. Thank you Nicholas for the review.

          Show
          Suresh Srinivas added a comment - Committed the patch to branch-1 and branch-1-win. Thank you Nicholas for the review.
          Hide
          Matt Foley added a comment -

          Closed upon release of Hadoop 1.2.0.

          Show
          Matt Foley added a comment - Closed upon release of Hadoop 1.2.0.

            People

            • Assignee:
              Suresh Srinivas
              Reporter:
              Ivan Mitic
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development