Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2159

Sporadic failures in TestNettyAvroRpcClient.spinThreadsCrazily()

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v1.4.0
    • Fix Version/s: v1.5.0
    • Component/s: Test
    • Labels:
    • Environment:

      centos 5

      Description

      TestNettyAvroRpcClient.spinThreadsCrazily() checks to see if the active thread count at the end is the same as the count when it started. I see that once in while the count is off by 1 due to a thread taking a bit longer to wind up.

      1. FLUME-2159.patch
        1 kB
        Roshan Naik
      2. FLUME-2159.v2.patch
        1.0 kB
        Roshan Naik
      3. FLUME-2159.v3.patch
        2 kB
        Roshan Naik

        Activity

        Hide
        roshan_naik Roshan Naik added a comment -

        Adding a small sleep before assertion, fixes the sporadic failures.

        Show
        roshan_naik Roshan Naik added a comment - Adding a small sleep before assertion, fixes the sporadic failures.
        Hide
        roshan_naik Roshan Naik added a comment -

        Seeing the sporadic failures again. This patch does not address the issue.

        Show
        roshan_naik Roshan Naik added a comment - Seeing the sporadic failures again. This patch does not address the issue.
        Hide
        roshan_naik Roshan Naik added a comment -

        The failure was becase, in some runs, there was one extra thread that was running at the start, which was no longer running in the end (perhaps a gc thread?). The test asserts to ensure that the number of threads at the start== threads at end for detecting thread leaks.

        If the #start < #end , then indeed this can be considered a thread leak. However in some cases it can be that #start > #end. So relaxed the assertion accordingly.

        Show
        roshan_naik Roshan Naik added a comment - The failure was becase, in some runs, there was one extra thread that was running at the start, which was no longer running in the end (perhaps a gc thread?). The test asserts to ensure that the number of threads at the start== threads at end for detecting thread leaks. If the #start < #end , then indeed this can be considered a thread leak. However in some cases it can be that #start > #end. So relaxed the assertion accordingly.
        Hide
        roshan_naik Roshan Naik added a comment -

        I am thinking we should perhaps just drop this test. Its really flaky. Its really not accounting for the fact that threads are being spun up & terminated by other parts of the system. Its do not see a straightforward way to fix this either.. it seems tricky to exclude other threads in the count.

        Here are some observations:
        I see that sometimes (finalThreadCount > initialThreadCount), sometimes it is (finalThreadCount < initialThreadCount) and at other times they are equal.

        Here is an thread dump of a case where finalThreadCount =6 & initialThreadCount=5

        Thread Dump Before
        Thread: Reference Handler ,Thd Id: 2
        Thread: Finalizer ,Thd Id: 3
        Thread: Signal Dispatcher ,Thd Id: 4
        Thread: main ,Thd Id: 1
        Thread: pool-38-thread-1 ,Thd Id: 103
        Thread: Flume Avro RPC Client Call Invoker 1,Thd Id: 105

        Thread Dump After
        Thread: Reference Handler ,Thd Id: 2
        Thread: Finalizer ,Thd Id: 3
        Thread: Signal Dispatcher ,Thd Id: 4
        Thread: main ,Thd Id: 1
        Thread: Avro NettyTransceiver I/O Worker 1 ,Thd Id: 2106

        Here is another case where finalThreadCount =5 & initialThreadCount=5. But notice how the threads are not the same before/after.

        Thread Dump Before
        Thread: Reference Handler ,Thd Id: 2
        Thread: Finalizer ,Thd Id: 3
        Thread: Signal Dispatcher ,Thd Id: 4
        Thread: main ,Thd Id: 1
        Thread: pool-38-thread-1 ,Thd Id: 103

        Thread Dump After
        Thread: Reference Handler ,Thd Id: 2
        Thread: Finalizer ,Thd Id: 3
        Thread: Signal Dispatcher ,Thd Id: 4
        Thread: main ,Thd Id: 1
        Thread: Avro NettyTransceiver I/O Worker 1 ,Thd Id: 2105

        Show
        roshan_naik Roshan Naik added a comment - I am thinking we should perhaps just drop this test. Its really flaky. Its really not accounting for the fact that threads are being spun up & terminated by other parts of the system. Its do not see a straightforward way to fix this either.. it seems tricky to exclude other threads in the count. Here are some observations: I see that sometimes (finalThreadCount > initialThreadCount), sometimes it is (finalThreadCount < initialThreadCount) and at other times they are equal. Here is an thread dump of a case where finalThreadCount =6 & initialThreadCount=5 Thread Dump Before Thread: Reference Handler ,Thd Id: 2 Thread: Finalizer ,Thd Id: 3 Thread: Signal Dispatcher ,Thd Id: 4 Thread: main ,Thd Id: 1 Thread: pool-38-thread-1 ,Thd Id: 103 Thread: Flume Avro RPC Client Call Invoker 1,Thd Id: 105 Thread Dump After Thread: Reference Handler ,Thd Id: 2 Thread: Finalizer ,Thd Id: 3 Thread: Signal Dispatcher ,Thd Id: 4 Thread: main ,Thd Id: 1 Thread: Avro NettyTransceiver I/O Worker 1 ,Thd Id: 2106 Here is another case where finalThreadCount =5 & initialThreadCount=5. But notice how the threads are not the same before/after. Thread Dump Before Thread: Reference Handler ,Thd Id: 2 Thread: Finalizer ,Thd Id: 3 Thread: Signal Dispatcher ,Thd Id: 4 Thread: main ,Thd Id: 1 Thread: pool-38-thread-1 ,Thd Id: 103 Thread Dump After Thread: Reference Handler ,Thd Id: 2 Thread: Finalizer ,Thd Id: 3 Thread: Signal Dispatcher ,Thd Id: 4 Thread: main ,Thd Id: 1 Thread: Avro NettyTransceiver I/O Worker 1 ,Thd Id: 2105
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Agreed. I will commit this patch after running tests.

        Show
        hshreedharan Hari Shreedharan added a comment - Agreed. I will commit this patch after running tests.
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Looks like the patch does not remove the thread. Do you want to submit a patch removing the test?

        Show
        hshreedharan Hari Shreedharan added a comment - Looks like the patch does not remove the thread. Do you want to submit a patch removing the test?
        Hide
        roshan_naik Roshan Naik added a comment -

        removing flaky test TestNettyAvroRpcClient.spinThreadsCrazily()

        Show
        roshan_naik Roshan Naik added a comment - removing flaky test TestNettyAvroRpcClient.spinThreadsCrazily()
        Hide
        hshreedharan Hari Shreedharan added a comment -

        +1.

        Show
        hshreedharan Hari Shreedharan added a comment - +1.
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Committed, rev: 68fe4d4. Thanks Roshan

        Show
        hshreedharan Hari Shreedharan added a comment - Committed, rev: 68fe4d4. Thanks Roshan
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in flume-trunk #513 (See https://builds.apache.org/job/flume-trunk/513/)
        FLUME-2159. Remove TestNettyAvroRpcClient.spinThreadsCrazily. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=68fe4d45123473adbef1077c5de20b4dd48d3a1d)

        • flume-ng-sdk/src/test/java/org/apache/flume/api/TestNettyAvroRpcClient.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in flume-trunk #513 (See https://builds.apache.org/job/flume-trunk/513/ ) FLUME-2159 . Remove TestNettyAvroRpcClient.spinThreadsCrazily. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=68fe4d45123473adbef1077c5de20b4dd48d3a1d ) flume-ng-sdk/src/test/java/org/apache/flume/api/TestNettyAvroRpcClient.java

          People

          • Assignee:
            roshan_naik Roshan Naik
            Reporter:
            roshan_naik Roshan Naik
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development