Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2482

Race condition in File Channels' Log.removeOldLogs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0.1
    • Fix Version/s: 1.6.0
    • Component/s: File Channel
    • Labels:

      Description

      TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue sometimes produces a race condition at Log.removeOldLogs.

      https://travis-ci.org/Stratio/flume/jobs/36782318#L6193

      testToggleCheckpointCompressionFromFalseToTrue(org.apache.flume.channel.file.TestFileChannelRestart) Time elapsed: 144362 sec <<< ERROR!

      java.util.ConcurrentModificationException

      at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)

      at java.util.ArrayList$Itr.next(ArrayList.java:831)

      at org.apache.flume.channel.file.Log.removeOldLogs(Log.java:1070)

      at org.apache.flume.channel.file.Log.writeCheckpoint(Log.java:1055)

      at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)

      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

      at java.lang.reflect.Method.invoke(Method.java:606)

      at org.fest.reflect.method.Invoker.invoke(Invoker.java:110)

      at org.apache.flume.channel.file.TestUtils.forceCheckpoint(TestUtils.java:134)

      at org.apache.flume.channel.file.TestFileChannelRestart.restartToggleCompression(TestFileChannelRestart.java:930)

      at org.apache.flume.channel.file.TestFileChannelRestart.testToggleCheckpointCompressionFromFalseToTrue(TestFileChannelRestart.java:896)

      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

      at java.lang.reflect.Method.invoke(Method.java:606)

      at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)

      at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)

      at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)

      at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)

      at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)

      at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)

      at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)

      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)

      at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)

      at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)

      at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)

      at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)

      at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)

      at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)

      at org.junit.runners.ParentRunner.run(ParentRunner.java:300)

      at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)

      at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)

      at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)

      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

      at java.lang.reflect.Method.invoke(Method.java:606)

      at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)

      at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)

      at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)

      at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)

      at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

        Activity

        Hide
        hshreedharan Hari Shreedharan added a comment -

        Did you happen to investigate this? It looks like the test or the test util that forces the checkpoint is likely buggy.

        Show
        hshreedharan Hari Shreedharan added a comment - Did you happen to investigate this? It looks like the test or the test util that forces the checkpoint is likely buggy.
        Hide
        smolav Santiago M. Mola added a comment -

        I've got no patch yet. But yes, it seems a bug in the test: it is forcing a checkpoint while Log is already checkpointing in a different thread.

        Show
        smolav Santiago M. Mola added a comment - I've got no patch yet. But yes, it seems a bug in the test: it is forcing a checkpoint while Log is already checkpointing in a different thread.
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Yep, seems like it. We specifically made the removeOldLogs method not under the lock to ensure that the deletes happen outside, so the channel can continue operating at the same time. Perhaps have a separate lock for the removeOldLogs method so that this does not happen?

        Show
        hshreedharan Hari Shreedharan added a comment - Yep, seems like it. We specifically made the removeOldLogs method not under the lock to ensure that the deletes happen outside, so the channel can continue operating at the same time. Perhaps have a separate lock for the removeOldLogs method so that this does not happen?
        Hide
        jrufus Johny Rufus added a comment -

        Since the error can only happen when test cases force a checkpoint and sometimes the scheduled checkpoint could also be happening simultaneously, am attaching a patch that sets the checkpoint interval to a high value for all the tests in TestFileChannelRestart, so that this scenario doesn't occur anymore.

        Show
        jrufus Johny Rufus added a comment - Since the error can only happen when test cases force a checkpoint and sometimes the scheduled checkpoint could also be happening simultaneously, am attaching a patch that sets the checkpoint interval to a high value for all the tests in TestFileChannelRestart, so that this scenario doesn't occur anymore.
        Hide
        smolav Santiago M. Mola added a comment -

        Johny Rufus Sounds good.

        Show
        smolav Santiago M. Mola added a comment - Johny Rufus Sounds good.
        Hide
        hshreedharan Hari Shreedharan added a comment -

        +1. I am committing this one.

        Show
        hshreedharan Hari Shreedharan added a comment - +1. I am committing this one.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit a582c100f5f0b368a6dcc77c2b29138ef4b28840 in flume's branch refs/heads/trunk from Hari Shreedharan
        [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=a582c10 ]

        FLUME-2482. File Channel tests must disable scheduled checkpoint to avoid a race condition with forced checkpoint.

        (Johny Rufus via Hari)

        Show
        jira-bot ASF subversion and git services added a comment - Commit a582c100f5f0b368a6dcc77c2b29138ef4b28840 in flume's branch refs/heads/trunk from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=a582c10 ] FLUME-2482 . File Channel tests must disable scheduled checkpoint to avoid a race condition with forced checkpoint. (Johny Rufus via Hari)
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 150dbc519d0552ea94e1d6de2d1b455eae022155 in flume's branch refs/heads/flume-1.6 from Hari Shreedharan
        [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=150dbc5 ]

        FLUME-2482. File Channel tests must disable scheduled checkpoint to avoid a race condition with forced checkpoint.

        (Johny Rufus via Hari)

        Show
        jira-bot ASF subversion and git services added a comment - Commit 150dbc519d0552ea94e1d6de2d1b455eae022155 in flume's branch refs/heads/flume-1.6 from Hari Shreedharan [ https://git-wip-us.apache.org/repos/asf?p=flume.git;h=150dbc5 ] FLUME-2482 . File Channel tests must disable scheduled checkpoint to avoid a race condition with forced checkpoint. (Johny Rufus via Hari)
        Hide
        hshreedharan Hari Shreedharan added a comment -

        Committed! Thanks Johny!

        Show
        hshreedharan Hari Shreedharan added a comment - Committed! Thanks Johny!
        Hide
        hudson Hudson added a comment -

        UNSTABLE: Integrated in Flume-trunk-hbase-98 #35 (See https://builds.apache.org/job/Flume-trunk-hbase-98/35/)
        FLUME-2482. File Channel tests must disable scheduled checkpoint to avoid a race condition with forced checkpoint. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=a582c100f5f0b368a6dcc77c2b29138ef4b28840)

        • flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelRestart.java
        Show
        hudson Hudson added a comment - UNSTABLE: Integrated in Flume-trunk-hbase-98 #35 (See https://builds.apache.org/job/Flume-trunk-hbase-98/35/ ) FLUME-2482 . File Channel tests must disable scheduled checkpoint to avoid a race condition with forced checkpoint. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=a582c100f5f0b368a6dcc77c2b29138ef4b28840 ) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelRestart.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in flume-trunk #676 (See https://builds.apache.org/job/flume-trunk/676/)
        FLUME-2482. File Channel tests must disable scheduled checkpoint to avoid a race condition with forced checkpoint. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=a582c100f5f0b368a6dcc77c2b29138ef4b28840)

        • flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelRestart.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in flume-trunk #676 (See https://builds.apache.org/job/flume-trunk/676/ ) FLUME-2482 . File Channel tests must disable scheduled checkpoint to avoid a race condition with forced checkpoint. (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=a582c100f5f0b368a6dcc77c2b29138ef4b28840 ) flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannelRestart.java

          People

          • Assignee:
            jrufus Johny Rufus
            Reporter:
            smolav Santiago M. Mola
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development