Flume
  1. Flume
  2. FLUME-592

Fix intermittent / flaky tests for v0.9.4

    Details

    • Type: Epic Epic
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: v0.9.4
    • Component/s: Technical Debt, Test
    • Labels:
      None

      Description

      This is an umbrella jira that covers a slew of intermittent test and race conditions fixes and refactors.

        Issue Links

          Issues in Epic

          There are no issues in this epic.

            Activity

            Hide
            Jonathan Hsieh added a comment - - edited

            Here is the list of flaky tests

            These should be fixed:

            • TestDiskFailoverSource.java
            • TestNaiveFileWALSource.java
            • TestDiskFailoverBehavior.java
            • TestDiskFailoverAgent.java
            • TestAgentSink.java
            • TestWriteAheadLogDecorator.java
            • TestContextThreading.java
            • TestDiskFailoverBenchmarking.java
            • TestDiskFailoverThenRoll.java
            • TestAgentFailChainSink.java
            • TestNaiveFileWALManagerConcurrently.java

            These are still flaky

            • TestChokeDecos.java
            • TestCollectorSink.java

            Roughly the best order to review these patches are:

            The bulk of these patches have been running and passting the flakey tests every 15 minutes for on the order of 24-48 hours (Some longer). There are a handful new mostly deterministically broken tests that were fixed more recently. There was also a test run of the end-to-end tests (scripting through the shell) which seems to have worked.

            Show
            Jonathan Hsieh added a comment - - edited Here is the list of flaky tests These should be fixed: TestDiskFailoverSource.java TestNaiveFileWALSource.java TestDiskFailoverBehavior.java TestDiskFailoverAgent.java TestAgentSink.java TestWriteAheadLogDecorator.java TestContextThreading.java TestDiskFailoverBenchmarking.java TestDiskFailoverThenRoll.java TestAgentFailChainSink.java TestNaiveFileWALManagerConcurrently.java These are still flaky TestChokeDecos.java TestCollectorSink.java Roughly the best order to review these patches are: FLUME-586 (This fixed TestDiskFailoverSource and TestNaiveFileWALSource) FLUME-569 , FLUME-593 , FLUME-596 , FLUME-589 , FLUME-595 (Fixes the bulk of the fixed flakies, breaks some others tests along the way). FLUME-597 (TestDiskFailoverSource and TestNaiveFileWALSource fixed after semantics changes) FLUME-598 (Restores behavior of newly broken tests). The bulk of these patches have been running and passting the flakey tests every 15 minutes for on the order of 24-48 hours (Some longer). There are a handful new mostly deterministically broken tests that were fixed more recently. There was also a test run of the end-to-end tests (scripting through the shell) which seems to have worked.
            Hide
            Jonathan Hsieh added a comment -

            Hm.. TestAgentSink flaked out.

            Show
            Jonathan Hsieh added a comment - Hm.. TestAgentSink flaked out.
            Hide
            Jonathan Hsieh added a comment -

            TestNaiveFileWALManagerConcurrently still fails intermittently. This is usually the testSharedDecoHuge
            test due to unable to allocate thread error. This test consumes many threads is more likely when the machine is under heavy load.

            Show
            Jonathan Hsieh added a comment - TestNaiveFileWALManagerConcurrently still fails intermittently. This is usually the testSharedDecoHuge test due to unable to allocate thread error. This test consumes many threads is more likely when the machine is under heavy load.
            Hide
            Jonathan Hsieh added a comment -

            TestAgentFailChainSink testConfirmBEChain still fails intermittently.

            Show
            Jonathan Hsieh added a comment - TestAgentFailChainSink testConfirmBEChain still fails intermittently.
            Hide
            Jonathan Hsieh added a comment -

            I'm going to defer complete resolution of this to the 0.9.5 release. At the time of this comment, we have resolved a significant majority of the intermittent tests.

            Show
            Jonathan Hsieh added a comment - I'm going to defer complete resolution of this to the 0.9.5 release. At the time of this comment, we have resolved a significant majority of the intermittent tests.
            Hide
            Jonathan Hsieh added a comment -

            Scoped this out for v0.9.4. Will create new umbrella issue for v0.9.5

            Show
            Jonathan Hsieh added a comment - Scoped this out for v0.9.4. Will create new umbrella issue for v0.9.5

              People

              • Assignee:
                Jonathan Hsieh
                Reporter:
                Jonathan Hsieh
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Development