Flume
  1. Flume
  2. FLUME-798

Blocked append interrupted by rotation event

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: v0.9.5
    • Fix Version/s: v0.9.5
    • Component/s: Node
    • Labels:
      None

      Description

      Our flume collector seem's to work for a short period of time and then fails with the following exception. When this happens the collector does not reconnect and the system becomes inactive with the processes still running.

      2011-10-14 01:49:47,386 [logicalNode collector0_log_dir-115] ERROR com.cloudera.flume.core.connector.DirectDriver - Closing down due to exception during append calls
      2011-10-14 01:49:47,387 [logicalNode collector0_log_dir-115] INFO com.cloudera.flume.core.connector.DirectDriver - Connector logicalNode collector0_log_dir-115 exited with error: Blocked append interrupted by rotation event
      java.lang.InterruptedException: Blocked append interrupted by rotation event
      at com.cloudera.flume.handlers.rolling.RollSink.append(RollSink.java:209)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.core.MaskDecorator.append(MaskDecorator.java:43)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.debug.InsistentOpenDecorator.append(InsistentOpenDecorator.java:169)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.debug.StubbornAppendSink.append(StubbornAppendSink.java:71)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.debug.InsistentAppendDecorator.append(InsistentAppendDecorator.java:110)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.endtoend.AckChecksumChecker.append(AckChecksumChecker.java:113)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.batch.UnbatchingDecorator.append(UnbatchingDecorator.java:62)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.batch.GunzipDecorator.append(GunzipDecorator.java:81)
      at com.cloudera.flume.collector.CollectorSink.append(CollectorSink.java:222)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.core.extractors.DateExtractor.append(DateExtractor.java:129)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.core.extractors.RegexExtractor.append(RegexExtractor.java:88)
      at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:133)
      2011-10-14 01:49:47,388 [logicalNode collector0_log_dir-115] INFO com.cloudera.flume.collector.CollectorSource - closed
      2011-10-14 01:49:48,391 [logicalNode collector0_log_dir-115] INFO com.cloudera.flume.handlers.thrift.ThriftEventSource - Closed server on port 36892...
      2011-10-14 01:49:48,391 [logicalNode collector0_log_dir-115] INFO com.cloudera.flume.handlers.thrift.ThriftEventSource - Queue still has 1000 elements ...
      2011-10-14 01:49:58,399 [logicalNode collector0_log_dir-115] WARN com.cloudera.flume.handlers.thrift.ThriftEventSource - Close timed out due to no progress. Closing despite having 1000 values still enqueued
      2011-10-14 01:49:58,399 [logicalNode collector0_log_dir-115] INFO com.cloudera.flume.handlers.rolling.RollSink - closing RollSink 'escapedCustomDfs("hdfs://van-mang-perf-hadoop-namenode1.net:8020/rawLogs/%

      {dateyear}%{datemonth}%{dateday}/%{datehr}00","raw-%{rolltag}" )'
      2011-10-14 01:49:58,400 [logicalNode collector0_log_dir-115] INFO com.cloudera.flume.handlers.rolling.RollSink - double close 'escapedCustomDfs("hdfs://van-mang-perf-hadoop-namenode1.net:8020/rawLogs/%{dateyear}

      -%

      {datemonth}

      -%

      {dateday}

      /%

      {datehr}

      00","raw-%

      {rolltag}

      " )'
      2011-10-14 01:49:58,400 [logicalNode collector0_log_dir-115] ERROR com.cloudera.flume.core.connector.DirectDriver - Exiting driver logicalNode collector0_log_dir-115 in error state CollectorSource | RegexExtractor because Blocked append interrupted by rotation event

      1. Flume-798.patch.final
        16 kB
        Prasad Mujumdar
      2. Flume-798.patch
        12 kB
        Prasad Mujumdar
      3. 0001-FLUME-798-Modified-RollSink-to-not-cancel-pending-si.patch
        2 kB
        Cameron Gandevia
      4. 0001-FLUME-798-Modified-RollSink-to-not-cancel-pending-si.patch
        2 kB
        Cameron Gandevia

        Activity

        Hide
        Prasad Mujumdar added a comment -

        Looks like the patch in the Jira is the old one. Attaching the
        updated patch that was finally committed.

        Show
        Prasad Mujumdar added a comment - Looks like the patch in the Jira is the old one. Attaching the updated patch that was finally committed.
        Hide
        Joydeep Sen Sarma added a comment -

        as Colin has commented above - do you think FLUME-875 is related?

        Show
        Joydeep Sen Sarma added a comment - as Colin has commented above - do you think FLUME-875 is related?
        Hide
        Prasad Mujumdar added a comment -

        Patch committed to trunk

        Show
        Prasad Mujumdar added a comment - Patch committed to trunk
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2869/#review3493
        -----------------------------------------------------------

        Ship it!

        Hey Prasad,

        I'm a little concerned about the possibility of having a deadlock when you disabling a slow/blocked append, but this an improvement because if gives you the option to chooses one behavior or the other.

        Please fix the typos and commit.

        Thanks!
        Jon.

        flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java
        <https://reviews.apache.org/r/2869/#comment7774>

        spelling? rollCanceledAppends

        hm.. I apparently dictionary says cancelled and canceled are both ok..

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java
        <https://reviews.apache.org/r/2869/#comment7773>

        typo? gots

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java
        <https://reviews.apache.org/r/2869/#comment7772>

        typo? testSlowSinkdRoll (extra d?)

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java
        <https://reviews.apache.org/r/2869/#comment7771>

        Typo? testWaitingSlowSinkdRoll (extra d?)

        • jmhsieh

        On 2011-11-23 00:43:30, Prasad Mujumdar wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2869/

        -----------------------------------------------------------

        (Updated 2011-11-23 00:43:30)

        Review request for jmhsieh and Eric Sammer.

        Summary

        -------

        If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread.

        Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink.

        Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml.

        This addresses bug Flume-798.

        https://issues.apache.org/jira/browse/Flume-798

        Diffs

        -----

        flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef

        flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollRollTags.java 01d6574

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2869/diff

        Testing

        -------

        Added new test TestSlowSinkRoll. will run full regression test.

        Thanks,

        Prasad

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2869/#review3493 ----------------------------------------------------------- Ship it! Hey Prasad, I'm a little concerned about the possibility of having a deadlock when you disabling a slow/blocked append, but this an improvement because if gives you the option to chooses one behavior or the other. Please fix the typos and commit. Thanks! Jon. flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java < https://reviews.apache.org/r/2869/#comment7774 > spelling? rollCanceledAppends hm.. I apparently dictionary says cancelled and canceled are both ok.. flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java < https://reviews.apache.org/r/2869/#comment7773 > typo? gots flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java < https://reviews.apache.org/r/2869/#comment7772 > typo? testSlowSinkdRoll (extra d?) flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java < https://reviews.apache.org/r/2869/#comment7771 > Typo? testWaitingSlowSinkdRoll (extra d?) jmhsieh On 2011-11-23 00:43:30, Prasad Mujumdar wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2869/ ----------------------------------------------------------- (Updated 2011-11-23 00:43:30) Review request for jmhsieh and Eric Sammer. Summary ------- If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread. Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink. Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml. This addresses bug Flume-798. https://issues.apache.org/jira/browse/Flume-798 Diffs ----- flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1 flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollRollTags.java 01d6574 flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION Diff: https://reviews.apache.org/r/2869/diff Testing ------- Added new test TestSlowSinkRoll. will run full regression test. Thanks, Prasad
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2869/
        -----------------------------------------------------------

        (Updated 2011-11-23 00:43:30.313871)

        Review request for jmhsieh and Eric Sammer.

        Changes
        -------

        Updated the patch to support a timeout value 0 which makes the trigger thread not to interrupt the append. Corrected the space and comments. Added more validations to tests.

        Summary
        -------

        If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread.

        Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink.
        Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml.

        This addresses bug Flume-798.
        https://issues.apache.org/jira/browse/Flume-798

        Diffs (updated)


        flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef
        flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1
        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollRollTags.java 01d6574
        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d
        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2869/diff

        Testing
        -------

        Added new test TestSlowSinkRoll. will run full regression test.

        Thanks,

        Prasad

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2869/ ----------------------------------------------------------- (Updated 2011-11-23 00:43:30.313871) Review request for jmhsieh and Eric Sammer. Changes ------- Updated the patch to support a timeout value 0 which makes the trigger thread not to interrupt the append. Corrected the space and comments. Added more validations to tests. Summary ------- If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread. Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink. Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml. This addresses bug Flume-798. https://issues.apache.org/jira/browse/Flume-798 Diffs (updated) flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1 flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollRollTags.java 01d6574 flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION Diff: https://reviews.apache.org/r/2869/diff Testing ------- Added new test TestSlowSinkRoll. will run full regression test. Thanks, Prasad
        Hide
        Clive Cox added a comment -

        I just wanted to add this exception I am seeing on Flume agent nodes as it seems very similar and may require a similar solution?

        2011-10-21 12:49:32,318 WARN com.cloudera.flume.handlers.text.TailSource: next unexpectedly interrupted :null
        java.lang.InterruptedException
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:877)
        at com.cloudera.flume.handlers.text.TailSource.next(TailSource.java:271)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:105)
        2011-10-21 12:49:32,319 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during append calls

        I am happy to start a new Jira, but thought I would put here first for comment.

        Show
        Clive Cox added a comment - I just wanted to add this exception I am seeing on Flume agent nodes as it seems very similar and may require a similar solution? 2011-10-21 12:49:32,318 WARN com.cloudera.flume.handlers.text.TailSource: next unexpectedly interrupted :null java.lang.InterruptedException at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:877) at com.cloudera.flume.handlers.text.TailSource.next(TailSource.java:271) at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:105) 2011-10-21 12:49:32,319 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during append calls I am happy to start a new Jira, but thought I would put here first for comment.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2869/#review3312
        -----------------------------------------------------------

        Hey Prasad, please fix spacing nits. I'm having a hard time see what the test is verifying or asserting. (what is success and what is failed in the test?) can you add comments to make it clearer to help me out?

        flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java
        <https://reviews.apache.org/r/2869/#comment7400>

        is 0 valid?

        Also, style, use { } please.

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java
        <https://reviews.apache.org/r/2869/#comment7397>

        apache license?

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java
        <https://reviews.apache.org/r/2869/#comment7398>

        I need help here. I don't see a check that see if "no events lost"...

        What is the test checking? Is the success condition no xxx exception being thrown?

        Can we check the driver to see if it got an exception (or no exception?)

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java
        <https://reviews.apache.org/r/2869/#comment7399>

        similar, what is this test checking? How can we tell it was handled "gracefully"?

        • jmhsieh

        On 2011-11-17 00:52:15, Prasad Mujumdar wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2869/

        -----------------------------------------------------------

        (Updated 2011-11-17 00:52:15)

        Review request for jmhsieh and Eric Sammer.

        Summary

        -------

        If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread.

        Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink.

        Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml.

        This addresses bug Flume-798.

        https://issues.apache.org/jira/browse/Flume-798

        Diffs

        -----

        flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef

        flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2869/diff

        Testing

        -------

        Added new test TestSlowSinkRoll. will run full regression test.

        Thanks,

        Prasad

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2869/#review3312 ----------------------------------------------------------- Hey Prasad, please fix spacing nits. I'm having a hard time see what the test is verifying or asserting. (what is success and what is failed in the test?) can you add comments to make it clearer to help me out? flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java < https://reviews.apache.org/r/2869/#comment7400 > is 0 valid? Also, style, use { } please. flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java < https://reviews.apache.org/r/2869/#comment7397 > apache license? flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java < https://reviews.apache.org/r/2869/#comment7398 > I need help here. I don't see a check that see if "no events lost"... What is the test checking? Is the success condition no xxx exception being thrown? Can we check the driver to see if it got an exception (or no exception?) flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java < https://reviews.apache.org/r/2869/#comment7399 > similar, what is this test checking? How can we tell it was handled "gracefully"? jmhsieh On 2011-11-17 00:52:15, Prasad Mujumdar wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2869/ ----------------------------------------------------------- (Updated 2011-11-17 00:52:15) Review request for jmhsieh and Eric Sammer. Summary ------- If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread. Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink. Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml. This addresses bug Flume-798. https://issues.apache.org/jira/browse/Flume-798 Diffs ----- flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1 flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION Diff: https://reviews.apache.org/r/2869/diff Testing ------- Added new test TestSlowSinkRoll. will run full regression test. Thanks, Prasad
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2869/#review3313
        -----------------------------------------------------------

        Hey Prasad, please fix spacing nits. I'm having a hard time see what the test is verifying or asserting. (what is success and what is failed in the test?) can you add comments to make it clearer to help me out?

        • jmhsieh

        On 2011-11-17 00:52:15, Prasad Mujumdar wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2869/

        -----------------------------------------------------------

        (Updated 2011-11-17 00:52:15)

        Review request for jmhsieh and Eric Sammer.

        Summary

        -------

        If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread.

        Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink.

        Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml.

        This addresses bug Flume-798.

        https://issues.apache.org/jira/browse/Flume-798

        Diffs

        -----

        flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef

        flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d

        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2869/diff

        Testing

        -------

        Added new test TestSlowSinkRoll. will run full regression test.

        Thanks,

        Prasad

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2869/#review3313 ----------------------------------------------------------- Hey Prasad, please fix spacing nits. I'm having a hard time see what the test is verifying or asserting. (what is success and what is failed in the test?) can you add comments to make it clearer to help me out? jmhsieh On 2011-11-17 00:52:15, Prasad Mujumdar wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2869/ ----------------------------------------------------------- (Updated 2011-11-17 00:52:15) Review request for jmhsieh and Eric Sammer. Summary ------- If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread. Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink. Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml. This addresses bug Flume-798. https://issues.apache.org/jira/browse/Flume-798 Diffs ----- flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1 flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION Diff: https://reviews.apache.org/r/2869/diff Testing ------- Added new test TestSlowSinkRoll. will run full regression test. Thanks, Prasad
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2869/
        -----------------------------------------------------------

        Review request for jmhsieh and Eric Sammer.

        Summary
        -------

        If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread.

        Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink.
        Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml.

        This addresses bug Flume-798.
        https://issues.apache.org/jira/browse/Flume-798

        Diffs


        flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef
        flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1
        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d
        flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2869/diff

        Testing
        -------

        Added new test TestSlowSinkRoll. will run full regression test.

        Thanks,

        Prasad

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2869/ ----------------------------------------------------------- Review request for jmhsieh and Eric Sammer. Summary ------- If the append takes longer than a second, then the roll-trigger thread aborts the append. This results into an interrupted exception that is not handled gracefully and hence causes a nullpointer exception in the pumper thread. Change the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink. Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml. This addresses bug Flume-798. https://issues.apache.org/jira/browse/Flume-798 Diffs flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfiguration.java 397dfef flume-core/src/main/java/com/cloudera/flume/handlers/rolling/RollSink.java 27302b1 flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestRollSink.java 761643d flume-core/src/test/java/com/cloudera/flume/handlers/rolling/TestSlowSinkRoll.java PRE-CREATION Diff: https://reviews.apache.org/r/2869/diff Testing ------- Added new test TestSlowSinkRoll. will run full regression test. Thanks, Prasad
        Hide
        Clive Cox added a comment -

        I am seeing something similr with:

        $ flume version
        Flume 0.9.4-cdh3u1
        Git repository https://github.com/cloudera/flume/flume-core
        rev NOT AVAILABLE
        Compiled by jenkins on 20110810-1901

        The exception stack trace is:
        2011-11-12 04:48:24,837 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during append calls
        java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1223)
        at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:976)
        at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
        at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
        at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
        at com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175)
        at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
        at com.cloudera.flume.handlers.debug.StubbornAppendSink.append(StubbornAppendSink.java:78)
        at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
        at com.cloudera.flume.handlers.debug.InsistentAppendDecorator.append(InsistentAppendDecorator.java:110)
        at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
        at com.cloudera.flume.handlers.endtoend.AckChecksumChecker.append(AckChecksumChecker.java:172)
        at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
        at com.cloudera.flume.handlers.batch.UnbatchingDecorator.append(UnbatchingDecorator.java:62)
        at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
        at com.cloudera.flume.handlers.batch.GunzipDecorator.append(GunzipDecorator.java:81)
        at com.cloudera.flume.collector.CollectorSink.append(CollectorSink.java:222)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)

        We are using flume to push logs to S3 on AWS - if that helps....

        Show
        Clive Cox added a comment - I am seeing something similr with: $ flume version Flume 0.9.4-cdh3u1 Git repository https://github.com/cloudera/flume/flume-core rev NOT AVAILABLE Compiled by jenkins on 20110810-1901 The exception stack trace is: 2011-11-12 04:48:24,837 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during append calls java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1223) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:976) at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296) at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) at com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175) at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) at com.cloudera.flume.handlers.debug.StubbornAppendSink.append(StubbornAppendSink.java:78) at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60) at com.cloudera.flume.handlers.debug.InsistentAppendDecorator.append(InsistentAppendDecorator.java:110) at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60) at com.cloudera.flume.handlers.endtoend.AckChecksumChecker.append(AckChecksumChecker.java:172) at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60) at com.cloudera.flume.handlers.batch.UnbatchingDecorator.append(UnbatchingDecorator.java:62) at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60) at com.cloudera.flume.handlers.batch.GunzipDecorator.append(GunzipDecorator.java:81) at com.cloudera.flume.collector.CollectorSink.append(CollectorSink.java:222) at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110) We are using flume to push logs to S3 on AWS - if that helps....
        Hide
        Cameron Gandevia added a comment -

        Hey

        Thanks for the patch. I initially tried changing the exception but didn't
        deal with the rolltag so it still failed. I will push the patch into our
        cluster and let you know how it goes over the weekend.

        On Tue, Nov 1, 2011 at 5:41 PM, Prasad Mujumdar (Updated) (JIRA) <


        Thanks

        Cameron Gandevia

        Show
        Cameron Gandevia added a comment - Hey Thanks for the patch. I initially tried changing the exception but didn't deal with the rolltag so it still failed. I will push the patch into our cluster and let you know how it goes over the weekend. On Tue, Nov 1, 2011 at 5:41 PM, Prasad Mujumdar (Updated) (JIRA) < – Thanks Cameron Gandevia
        Hide
        Prasad Mujumdar added a comment -

        @Cameron, I was looking at the issue and here's a patch that changes the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink.
        Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml. (though ideally it should be configuration for each sink ..)

        Let me know if you want to give it a try.

        Show
        Prasad Mujumdar added a comment - @Cameron, I was looking at the issue and here's a patch that changes the interrupted exception to Runtime so that it can be handled by direct driver. There were a couple of other related issues that are also fixed in the roll sink. Also it allows the wait time for trigger thread can be configured via flume.collector.roll.timeout property in flume-conf.xml. (though ideally it should be configuration for each sink ..) Let me know if you want to give it a try.
        Hide
        Cameron Gandevia added a comment -

        We applied https://issues.apache.org/jira/browse/Flume-808 and this patch to get our collectors working again. This patch is not the best solution because it creates the original problem of a downstream sink blocking but we needed something to work quickly so we modified the rollsink to not cancel pending tasks.

        The RollSink test will also not pass with this patch.

        We are looking at gracefully handling the InterruptExceptions and will submit a patch when finished.

        Show
        Cameron Gandevia added a comment - We applied https://issues.apache.org/jira/browse/Flume-808 and this patch to get our collectors working again. This patch is not the best solution because it creates the original problem of a downstream sink blocking but we needed something to work quickly so we modified the rollsink to not cancel pending tasks. The RollSink test will also not pass with this patch. We are looking at gracefully handling the InterruptExceptions and will submit a patch when finished.
        Hide
        Luke Forehand added a comment - - edited

        This is something we've been experiencing as well:

        2011-10-31 09:43:04,640 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during append calls
        java.lang.InterruptedException: Blocked append interrupted by rotation event
        at com.cloudera.flume.handlers.rolling.RollSink.append(RollSink.java:209)
        at com.cloudera.flume.agent.durability.NaiveFileWALDeco.append(NaiveFileWALDeco.java:132)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)
        2011-10-31 09:43:15,949 WARN com.cloudera.flume.handlers.thrift.ThriftEventSource: Close timed out due to no progress. Closing despite having 1000 values still enqueued
        2011-10-31 09:43:16,450 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state ACTIVE

        Show
        Luke Forehand added a comment - - edited This is something we've been experiencing as well: 2011-10-31 09:43:04,640 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during append calls java.lang.InterruptedException: Blocked append interrupted by rotation event at com.cloudera.flume.handlers.rolling.RollSink.append(RollSink.java:209) at com.cloudera.flume.agent.durability.NaiveFileWALDeco.append(NaiveFileWALDeco.java:132) at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110) 2011-10-31 09:43:15,949 WARN com.cloudera.flume.handlers.thrift.ThriftEventSource: Close timed out due to no progress. Closing despite having 1000 values still enqueued 2011-10-31 09:43:16,450 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state ACTIVE
        Hide
        Björn Edström added a comment -

        I'm experiencing the same issue, using the cloudera package 0.9.4+25.9-1~lenny-cdh3. Below is a stack trace that is similar but not completely identical to the original above.

        2011-10-25 11:38:42,346 INFO com.cloudera.flume.handlers.endtoend.AckListener$Empty: Empty Ack Listener ended 20111025-113813833+0000.1129353122525368.00000046
        2011-10-25 11:38:42,347 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: File lives in /var/lib/flume/flume-flume/agent/machine-044.d.company.net/writing/20111025-113813833+0000.1129353122525368.00000046
        2011-10-25 11:38:42,347 INFO com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile event sink: file=/var/lib/flume/flume-flume/agent/machine-044.d.company.net/writing/20111025-113842347+0000.1129381636287890.00000046
        2011-10-25 11:38:41,797 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during append calls
        java.lang.InterruptedException: Blocked append interrupted by rotation event
        at com.cloudera.flume.handlers.rolling.RollSink.append(RollSink.java:209)
        at com.cloudera.flume.agent.durability.NaiveFileWALDeco.append(NaiveFileWALDeco.java:132)
        at com.cloudera.flume.agent.AgentSink.append(AgentSink.java:139)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)
        2011-10-25 11:38:42,348 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode machine-044.d.company.net-44 exited with error: Blocked append interrupted by rotation event
        java.lang.InterruptedException: Blocked append interrupted by rotation event
        at com.cloudera.flume.handlers.rolling.RollSink.append(RollSink.java:209)
        at com.cloudera.flume.agent.durability.NaiveFileWALDeco.append(NaiveFileWALDeco.java:132)
        at com.cloudera.flume.agent.AgentSink.append(AgentSink.java:139)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)
        2011-10-25 11:38:42,348 ERROR com.cloudera.flume.core.connector.DirectDriver: Driver interrupted attempting to close source
        java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1186)
        at java.lang.Thread.join(Thread.java:1239)
        at com.spotify.flume.syslog2.ServerSocketSource.close(ServerSocketSource.java:121)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.ensureClosed(DirectDriver.java:142)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.errorCleanup(DirectDriver.java:163)
        at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:116)
        2011-10-25 11:38:42,348 INFO com.cloudera.flume.handlers.rolling.RollSink: closing RollSink 'ackingWal'
        2011-10-25 11:38:42,349 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: opening log file 20111025-113813833+0000.1129353122525368.00000046
        2011-10-25 11:38:42,350 INFO com.cloudera.flume.agent.WALAckManager: Ack for 20111025-113813833+0000.1129353122525368.00000046 is queued to be checked
        2011-10-25 11:38:42,350 INFO com.cloudera.flume.agent.durability.WALSource: end of file NaiveFileWALManager (dir=/var/lib/flume/flume-flume/agent/machine-044.d.company.net )
        2011-10-25 11:38:42,351 INFO com.cloudera.flume.handlers.endtoend.AckListener$Empty: Empty Ack Listener began 20111025-113842347+0000.1129381636287890.00000046
        2011-10-25 11:38:42,352 INFO com.cloudera.flume.handlers.hdfs.SeqfileEventSink: closed /var/lib/flume/flume-flume/agent/machine-044.d.company.net/writing/20111025-113842347+0000.1129381636287890.00000046
        2011-10-25 11:38:42,352 INFO com.cloudera.flume.handlers.endtoend.AckListener$Empty: Empty Ack Listener ended 20111025-113842347+0000.1129381636287890.00000046
        2011-10-25 11:38:42,352 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: File lives in /var/lib/flume/flume-flume/agent/machine-044.d.company.net/writing/20111025-113842347+0000.1129381636287890.00000046
        2011-10-25 11:38:42,352 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: NaiveFileWALManager shutting down
        2011-10-25 11:38:42,352 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: opening log file 20111025-113842347+0000.1129381636287890.00000046
        2011-10-25 11:38:42,353 INFO com.cloudera.flume.agent.WALAckManager: Ack for 20111025-113842347+0000.1129381636287890.00000046 is queued to be checked
        2011-10-25 11:38:42,354 INFO com.cloudera.flume.agent.durability.WALSource: end of file NaiveFileWALManager (dir=/var/lib/flume/flume-flume/agent/machine-044.d.company.net )
        2011-10-25 11:38:42,554 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: Already shutting down, but getting another shutting down notice, odd
        2011-10-25 11:38:42,768 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: NaiveFileWALManager shutting down
        2011-10-25 11:38:42,775 INFO com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port 35853 closed
        2011-10-25 11:38:42,852 ERROR com.cloudera.flume.core.connector.DirectDriver: Exiting driver logicalNode machine-044.d.company.net-44 in error state SyslogSocketSource | Agent because Blocked append interrupted by rotation event
        .. And here it just locks ..

        Show
        Björn Edström added a comment - I'm experiencing the same issue, using the cloudera package 0.9.4+25.9-1~lenny-cdh3. Below is a stack trace that is similar but not completely identical to the original above. 2011-10-25 11:38:42,346 INFO com.cloudera.flume.handlers.endtoend.AckListener$Empty: Empty Ack Listener ended 20111025-113813833+0000.1129353122525368.00000046 2011-10-25 11:38:42,347 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: File lives in /var/lib/flume/flume-flume/agent/machine-044.d.company.net/writing/20111025-113813833+0000.1129353122525368.00000046 2011-10-25 11:38:42,347 INFO com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile event sink: file=/var/lib/flume/flume-flume/agent/machine-044.d.company.net/writing/20111025-113842347+0000.1129381636287890.00000046 2011-10-25 11:38:41,797 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during append calls java.lang.InterruptedException: Blocked append interrupted by rotation event at com.cloudera.flume.handlers.rolling.RollSink.append(RollSink.java:209) at com.cloudera.flume.agent.durability.NaiveFileWALDeco.append(NaiveFileWALDeco.java:132) at com.cloudera.flume.agent.AgentSink.append(AgentSink.java:139) at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110) 2011-10-25 11:38:42,348 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode machine-044.d.company.net-44 exited with error: Blocked append interrupted by rotation event java.lang.InterruptedException: Blocked append interrupted by rotation event at com.cloudera.flume.handlers.rolling.RollSink.append(RollSink.java:209) at com.cloudera.flume.agent.durability.NaiveFileWALDeco.append(NaiveFileWALDeco.java:132) at com.cloudera.flume.agent.AgentSink.append(AgentSink.java:139) at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110) 2011-10-25 11:38:42,348 ERROR com.cloudera.flume.core.connector.DirectDriver: Driver interrupted attempting to close source java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1186) at java.lang.Thread.join(Thread.java:1239) at com.spotify.flume.syslog2.ServerSocketSource.close(ServerSocketSource.java:121) at com.cloudera.flume.core.connector.DirectDriver$PumperThread.ensureClosed(DirectDriver.java:142) at com.cloudera.flume.core.connector.DirectDriver$PumperThread.errorCleanup(DirectDriver.java:163) at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:116) 2011-10-25 11:38:42,348 INFO com.cloudera.flume.handlers.rolling.RollSink: closing RollSink 'ackingWal' 2011-10-25 11:38:42,349 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: opening log file 20111025-113813833+0000.1129353122525368.00000046 2011-10-25 11:38:42,350 INFO com.cloudera.flume.agent.WALAckManager: Ack for 20111025-113813833+0000.1129353122525368.00000046 is queued to be checked 2011-10-25 11:38:42,350 INFO com.cloudera.flume.agent.durability.WALSource: end of file NaiveFileWALManager (dir=/var/lib/flume/flume-flume/agent/machine-044.d.company.net ) 2011-10-25 11:38:42,351 INFO com.cloudera.flume.handlers.endtoend.AckListener$Empty: Empty Ack Listener began 20111025-113842347+0000.1129381636287890.00000046 2011-10-25 11:38:42,352 INFO com.cloudera.flume.handlers.hdfs.SeqfileEventSink: closed /var/lib/flume/flume-flume/agent/machine-044.d.company.net/writing/20111025-113842347+0000.1129381636287890.00000046 2011-10-25 11:38:42,352 INFO com.cloudera.flume.handlers.endtoend.AckListener$Empty: Empty Ack Listener ended 20111025-113842347+0000.1129381636287890.00000046 2011-10-25 11:38:42,352 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: File lives in /var/lib/flume/flume-flume/agent/machine-044.d.company.net/writing/20111025-113842347+0000.1129381636287890.00000046 2011-10-25 11:38:42,352 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: NaiveFileWALManager shutting down 2011-10-25 11:38:42,352 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: opening log file 20111025-113842347+0000.1129381636287890.00000046 2011-10-25 11:38:42,353 INFO com.cloudera.flume.agent.WALAckManager: Ack for 20111025-113842347+0000.1129381636287890.00000046 is queued to be checked 2011-10-25 11:38:42,354 INFO com.cloudera.flume.agent.durability.WALSource: end of file NaiveFileWALManager (dir=/var/lib/flume/flume-flume/agent/machine-044.d.company.net ) 2011-10-25 11:38:42,554 WARN com.cloudera.flume.agent.durability.NaiveFileWALManager: Already shutting down, but getting another shutting down notice, odd 2011-10-25 11:38:42,768 INFO com.cloudera.flume.agent.durability.NaiveFileWALManager: NaiveFileWALManager shutting down 2011-10-25 11:38:42,775 INFO com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port 35853 closed 2011-10-25 11:38:42,852 ERROR com.cloudera.flume.core.connector.DirectDriver: Exiting driver logicalNode machine-044.d.company.net-44 in error state SyslogSocketSource | Agent because Blocked append interrupted by rotation event .. And here it just locks ..
        Hide
        Cameron Gandevia added a comment -

        It seems like a concurrency issue in the RollSink. It looks like the roll operation being invoked by the Trigger thread is causing a pending append to be cancelled resulting in a CancellationException propagating to the DirectDriver causing it to shutdown.

        I noticed there was a recent change where previous synchronized blocks were replaced with read/write locks which is probably where this issue came from.

        It can be reproduced be making the append call exceed the roll time.

        Show
        Cameron Gandevia added a comment - It seems like a concurrency issue in the RollSink. It looks like the roll operation being invoked by the Trigger thread is causing a pending append to be cancelled resulting in a CancellationException propagating to the DirectDriver causing it to shutdown. I noticed there was a recent change where previous synchronized blocks were replaced with read/write locks which is probably where this issue came from. It can be reproduced be making the append call exceed the roll time.

          People

          • Assignee:
            Prasad Mujumdar
            Reporter:
            Cameron Gandevia
          • Votes:
            7 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development