Flume
  1. Flume
  2. FLUME-1124

Lifecycle supervisor can cause thread contention, sometimes causing components to not startup.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: v1.1.0
    • Fix Version/s: v1.2.0
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      Based on Brock's review of my patch, the following issues have been added to this jira:

      This isn't your code but it looks like there are a few problems here.

      1) monitorService on MonitorRunnable is not set so an NPE (which is not being logged is thrown:

      java.lang.NullPointerException
      at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:255)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

      2) The NPE is not being logged.
      3) If the NPE did not occur we'd have a bigger issue as scheduleAtFixedRate is called here but the runnable reschedules itself.
      4) Calling first supervise() and then start() creates two threads monitoring this service.

      1. FLUME-1124-3.patch
        4 kB
        Hari Shreedharan
      2. FLUME-1124-1.patch
        2 kB
        Hari Shreedharan

        Activity

        Hide
        Hudson added a comment -

        Integrated in flume-trunk #175 (See https://builds.apache.org/job/flume-trunk/175/)
        FLUME-1124: Lifecycle supervisor can cause thread contention, sometimes causing components to not startup.

        (Hari Shreedharan via Brock Noland) (Revision 1327622)

        Result = SUCCESS
        brock : http://svn.apache.org/viewvc/?view=rev&rev=1327622
        Files :

        • /incubator/flume/trunk/flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java
        Show
        Hudson added a comment - Integrated in flume-trunk #175 (See https://builds.apache.org/job/flume-trunk/175/ ) FLUME-1124 : Lifecycle supervisor can cause thread contention, sometimes causing components to not startup. (Hari Shreedharan via Brock Noland) (Revision 1327622) Result = SUCCESS brock : http://svn.apache.org/viewvc/?view=rev&rev=1327622 Files : /incubator/flume/trunk/flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java
        Hide
        Brock Noland added a comment -

        Committed in 1327622. Thank you for your patch Hari!

        Show
        Brock Noland added a comment - Committed in 1327622. Thank you for your patch Hari!
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/#review7002
        -----------------------------------------------------------

        Ship it!

        Hari, thank you very much! Please attach the patch to the JIRA.

        • Brock

        On 2012-04-18 08:14:42, Hari Shreedharan wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4753/

        -----------------------------------------------------------

        (Updated 2012-04-18 08:14:42)

        Review request for Flume.

        Summary

        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.

        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs

        -----

        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing

        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/#review7002 ----------------------------------------------------------- Ship it! Hari, thank you very much! Please attach the patch to the JIRA. Brock On 2012-04-18 08:14:42, Hari Shreedharan wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- (Updated 2012-04-18 08:14:42) Review request for Flume. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/
        -----------------------------------------------------------

        (Updated 2012-04-18 08:14:42.844120)

        Review request for Flume.

        Changes
        -------

        Using ScheduledThreadPoolExecutor now. No longer rescheduling from the monitorRunnable. When an unsupervise happens, the monitor is removed from the thread pool queue.

        Summary
        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.
        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs (updated)


        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing
        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- (Updated 2012-04-18 08:14:42.844120) Review request for Flume. Changes ------- Using ScheduledThreadPoolExecutor now. No longer rescheduling from the monitorRunnable. When an unsupervise happens, the monitor is removed from the thread pool queue. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs (updated) flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-04-17 15:44:12, Brock Noland wrote:

        > flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java, line 68

        > <https://reviews.apache.org/r/4753/diff/3/?file=102382#file102382line68>

        >

        > This isn't your code but it looks like there are a few problems here.

        >

        > 1) monitorService on MonitorRunnable is not set so an NPE (which is not being logged is thrown:

        >

        > java.lang.NullPointerException

        > at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:255)

        > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

        > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

        > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

        > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)

        > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)

        > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)

        > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        > at java.lang.Thread.run(Thread.java:662)

        >

        > 2) The NPE is not being logged.

        > 3) If the NPE did not occur we'd have a bigger issue as scheduleAtFixedRate is called here but the runnable reschedules itself.

        > 4) Calling first supervise() and then start() creates two threads monitoring this service.

        Hari Shreedharan wrote:

        Hi Brock,

        None of the issues are caused by this change. I can look into these and fix them, it seems like it is because of the way the start method is written. Here is a brief explanation:

        1 and 2) This is caused by the start() function being called without supervise being called. The reason we have not been seeing this seems to be that when start is called, the different components have not been added. So it never goes into the loop.

        3) Agree this is an issue, but because of the explanation to (1 and 2) above.

        4) Agreed, we should not be doing this. I will update the relevant code.

        Brock Noland wrote:

        I did not mean to imply this was caused by your change, but if we are "fixing" the class we might as well fix these issues as well.

        Yep, updated the jira.

        • Hari

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/#review6983
        -----------------------------------------------------------

        On 2012-04-17 15:06:58, Hari Shreedharan wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4753/

        -----------------------------------------------------------

        (Updated 2012-04-17 15:06:58)

        Review request for Flume.

        Summary

        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.

        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs

        -----

        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing

        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-04-17 15:44:12, Brock Noland wrote: > flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java, line 68 > < https://reviews.apache.org/r/4753/diff/3/?file=102382#file102382line68 > > > This isn't your code but it looks like there are a few problems here. > > 1) monitorService on MonitorRunnable is not set so an NPE (which is not being logged is thrown: > > java.lang.NullPointerException > at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:255) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > 2) The NPE is not being logged. > 3) If the NPE did not occur we'd have a bigger issue as scheduleAtFixedRate is called here but the runnable reschedules itself. > 4) Calling first supervise() and then start() creates two threads monitoring this service. Hari Shreedharan wrote: Hi Brock, None of the issues are caused by this change. I can look into these and fix them, it seems like it is because of the way the start method is written. Here is a brief explanation: 1 and 2) This is caused by the start() function being called without supervise being called. The reason we have not been seeing this seems to be that when start is called, the different components have not been added. So it never goes into the loop. 3) Agree this is an issue, but because of the explanation to (1 and 2) above. 4) Agreed, we should not be doing this. I will update the relevant code. Brock Noland wrote: I did not mean to imply this was caused by your change, but if we are "fixing" the class we might as well fix these issues as well. Yep, updated the jira. Hari ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/#review6983 ----------------------------------------------------------- On 2012-04-17 15:06:58, Hari Shreedharan wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- (Updated 2012-04-17 15:06:58) Review request for Flume. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-04-17 15:44:12, Brock Noland wrote:

        > flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java, line 68

        > <https://reviews.apache.org/r/4753/diff/3/?file=102382#file102382line68>

        >

        > This isn't your code but it looks like there are a few problems here.

        >

        > 1) monitorService on MonitorRunnable is not set so an NPE (which is not being logged is thrown:

        >

        > java.lang.NullPointerException

        > at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:255)

        > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

        > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

        > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

        > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)

        > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)

        > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)

        > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        > at java.lang.Thread.run(Thread.java:662)

        >

        > 2) The NPE is not being logged.

        > 3) If the NPE did not occur we'd have a bigger issue as scheduleAtFixedRate is called here but the runnable reschedules itself.

        > 4) Calling first supervise() and then start() creates two threads monitoring this service.

        Hari Shreedharan wrote:

        Hi Brock,

        None of the issues are caused by this change. I can look into these and fix them, it seems like it is because of the way the start method is written. Here is a brief explanation:

        1 and 2) This is caused by the start() function being called without supervise being called. The reason we have not been seeing this seems to be that when start is called, the different components have not been added. So it never goes into the loop.

        3) Agree this is an issue, but because of the explanation to (1 and 2) above.

        4) Agreed, we should not be doing this. I will update the relevant code.

        I did not mean to imply this was caused by your change, but if we are "fixing" the class we might as well fix these issues as well.

        • Brock

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/#review6983
        -----------------------------------------------------------

        On 2012-04-17 15:06:58, Hari Shreedharan wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4753/

        -----------------------------------------------------------

        (Updated 2012-04-17 15:06:58)

        Review request for Flume.

        Summary

        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.

        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs

        -----

        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing

        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-04-17 15:44:12, Brock Noland wrote: > flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java, line 68 > < https://reviews.apache.org/r/4753/diff/3/?file=102382#file102382line68 > > > This isn't your code but it looks like there are a few problems here. > > 1) monitorService on MonitorRunnable is not set so an NPE (which is not being logged is thrown: > > java.lang.NullPointerException > at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:255) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > 2) The NPE is not being logged. > 3) If the NPE did not occur we'd have a bigger issue as scheduleAtFixedRate is called here but the runnable reschedules itself. > 4) Calling first supervise() and then start() creates two threads monitoring this service. Hari Shreedharan wrote: Hi Brock, None of the issues are caused by this change. I can look into these and fix them, it seems like it is because of the way the start method is written. Here is a brief explanation: 1 and 2) This is caused by the start() function being called without supervise being called. The reason we have not been seeing this seems to be that when start is called, the different components have not been added. So it never goes into the loop. 3) Agree this is an issue, but because of the explanation to (1 and 2) above. 4) Agreed, we should not be doing this. I will update the relevant code. I did not mean to imply this was caused by your change, but if we are "fixing" the class we might as well fix these issues as well. Brock ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/#review6983 ----------------------------------------------------------- On 2012-04-17 15:06:58, Hari Shreedharan wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- (Updated 2012-04-17 15:06:58) Review request for Flume. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-04-17 15:44:12, Brock Noland wrote:

        > flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java, line 68

        > <https://reviews.apache.org/r/4753/diff/3/?file=102382#file102382line68>

        >

        > This isn't your code but it looks like there are a few problems here.

        >

        > 1) monitorService on MonitorRunnable is not set so an NPE (which is not being logged is thrown:

        >

        > java.lang.NullPointerException

        > at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:255)

        > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

        > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

        > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

        > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)

        > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)

        > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)

        > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        > at java.lang.Thread.run(Thread.java:662)

        >

        > 2) The NPE is not being logged.

        > 3) If the NPE did not occur we'd have a bigger issue as scheduleAtFixedRate is called here but the runnable reschedules itself.

        > 4) Calling first supervise() and then start() creates two threads monitoring this service.

        Hi Brock,

        None of the issues are caused by this change. I can look into these and fix them, it seems like it is because of the way the start method is written. Here is a brief explanation:
        1 and 2) This is caused by the start() function being called without supervise being called. The reason we have not been seeing this seems to be that when start is called, the different components have not been added. So it never goes into the loop.
        3) Agree this is an issue, but because of the explanation to (1 and 2) above.
        4) Agreed, we should not be doing this. I will update the relevant code.

        • Hari

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/#review6983
        -----------------------------------------------------------

        On 2012-04-17 15:06:58, Hari Shreedharan wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4753/

        -----------------------------------------------------------

        (Updated 2012-04-17 15:06:58)

        Review request for Flume.

        Summary

        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.

        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs

        -----

        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing

        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-04-17 15:44:12, Brock Noland wrote: > flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java, line 68 > < https://reviews.apache.org/r/4753/diff/3/?file=102382#file102382line68 > > > This isn't your code but it looks like there are a few problems here. > > 1) monitorService on MonitorRunnable is not set so an NPE (which is not being logged is thrown: > > java.lang.NullPointerException > at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:255) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > 2) The NPE is not being logged. > 3) If the NPE did not occur we'd have a bigger issue as scheduleAtFixedRate is called here but the runnable reschedules itself. > 4) Calling first supervise() and then start() creates two threads monitoring this service. Hi Brock, None of the issues are caused by this change. I can look into these and fix them, it seems like it is because of the way the start method is written. Here is a brief explanation: 1 and 2) This is caused by the start() function being called without supervise being called. The reason we have not been seeing this seems to be that when start is called, the different components have not been added. So it never goes into the loop. 3) Agree this is an issue, but because of the explanation to (1 and 2) above. 4) Agreed, we should not be doing this. I will update the relevant code. Hari ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/#review6983 ----------------------------------------------------------- On 2012-04-17 15:06:58, Hari Shreedharan wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- (Updated 2012-04-17 15:06:58) Review request for Flume. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/#review6983
        -----------------------------------------------------------

        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java
        <https://reviews.apache.org/r/4753/#comment15535>

        This isn't your code but it looks like there are a few problems here.

        1) monitorService on MonitorRunnable is not set so an NPE (which is not being logged is thrown:

        java.lang.NullPointerException
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:255)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

        2) The NPE is not being logged.
        3) If the NPE did not occur we'd have a bigger issue as scheduleAtFixedRate is called here but the runnable reschedules itself.
        4) Calling first supervise() and then start() creates two threads monitoring this service.

        • Brock

        On 2012-04-17 15:06:58, Hari Shreedharan wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4753/

        -----------------------------------------------------------

        (Updated 2012-04-17 15:06:58)

        Review request for Flume.

        Summary

        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.

        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs

        -----

        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing

        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/#review6983 ----------------------------------------------------------- flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java < https://reviews.apache.org/r/4753/#comment15535 > This isn't your code but it looks like there are a few problems here. 1) monitorService on MonitorRunnable is not set so an NPE (which is not being logged is thrown: java.lang.NullPointerException at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:255) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2) The NPE is not being logged. 3) If the NPE did not occur we'd have a bigger issue as scheduleAtFixedRate is called here but the runnable reschedules itself. 4) Calling first supervise() and then start() creates two threads monitoring this service. Brock On 2012-04-17 15:06:58, Hari Shreedharan wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- (Updated 2012-04-17 15:06:58) Review request for Flume. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2012-04-17 14:34:44, Brock Noland wrote:

        > flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java, line 83

        > <https://reviews.apache.org/r/4753/diff/2/?file=102361#file102361line83>

        >

        > Should componentStartService be shutdown?

        It isn't really required, because each of the components are shutdown in the lines that immediately follow. But I have submitted an update, with a shutdown call.

        • Hari

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/#review6979
        -----------------------------------------------------------

        On 2012-04-17 15:06:58, Hari Shreedharan wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4753/

        -----------------------------------------------------------

        (Updated 2012-04-17 15:06:58)

        Review request for Flume.

        Summary

        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.

        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs

        -----

        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing

        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - On 2012-04-17 14:34:44, Brock Noland wrote: > flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java, line 83 > < https://reviews.apache.org/r/4753/diff/2/?file=102361#file102361line83 > > > Should componentStartService be shutdown? It isn't really required, because each of the components are shutdown in the lines that immediately follow. But I have submitted an update, with a shutdown call. Hari ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/#review6979 ----------------------------------------------------------- On 2012-04-17 15:06:58, Hari Shreedharan wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- (Updated 2012-04-17 15:06:58) Review request for Flume. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/
        -----------------------------------------------------------

        (Updated 2012-04-17 15:06:58.103483)

        Review request for Flume.

        Changes
        -------

        Shutting down the componentStartService on stop call.

        Summary
        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.
        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs (updated)


        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing
        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- (Updated 2012-04-17 15:06:58.103483) Review request for Flume. Changes ------- Shutting down the componentStartService on stop call. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs (updated) flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/#review6979
        -----------------------------------------------------------

        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java
        <https://reviews.apache.org/r/4753/#comment15531>

        Should componentStartService be shutdown?

        • Brock

        On 2012-04-17 07:06:54, Hari Shreedharan wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/4753/

        -----------------------------------------------------------

        (Updated 2012-04-17 07:06:54)

        Review request for Flume.

        Summary

        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.

        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs

        -----

        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing

        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/#review6979 ----------------------------------------------------------- flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java < https://reviews.apache.org/r/4753/#comment15531 > Should componentStartService be shutdown? Brock On 2012-04-17 07:06:54, Hari Shreedharan wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- (Updated 2012-04-17 07:06:54) Review request for Flume. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs ----- flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/4753/
        -----------------------------------------------------------

        Review request for Flume.

        Summary
        -------

        All components are started by the same executor, which is different from the executor service that checks the status of the components.

        This addresses bug FLUME-1124.
        https://issues.apache.org/jira/browse/FLUME-1124

        Diffs


        flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6

        Diff: https://reviews.apache.org/r/4753/diff

        Testing
        -------

        Thanks,

        Hari

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4753/ ----------------------------------------------------------- Review request for Flume. Summary ------- All components are started by the same executor, which is different from the executor service that checks the status of the components. This addresses bug FLUME-1124 . https://issues.apache.org/jira/browse/FLUME-1124 Diffs flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java a7407e6 Diff: https://reviews.apache.org/r/4753/diff Testing ------- Thanks, Hari
        Hide
        Hari Shreedharan added a comment -

        If there are several sinks and sources, the lifecycle supervisor for the node can cause thread contention. There is one single thread pool with 5 threads in the pool. Each component's monitorRunnable is scheduled to run immediately, each of which schedules itself to run after 3 seconds. This can cause issues if after 3 seconds some components remain to be started. They may not get started for a long time because the executor service's threads may be busy running the monitorRunnables which are scheduled by the initial component start runs.

        I am separating the starts to a different service - which is a SingleThreadExecutor.

        Show
        Hari Shreedharan added a comment - If there are several sinks and sources, the lifecycle supervisor for the node can cause thread contention. There is one single thread pool with 5 threads in the pool. Each component's monitorRunnable is scheduled to run immediately, each of which schedules itself to run after 3 seconds. This can cause issues if after 3 seconds some components remain to be started. They may not get started for a long time because the executor service's threads may be busy running the monitorRunnables which are scheduled by the initial component start runs. I am separating the starts to a different service - which is a SingleThreadExecutor.

          People

          • Assignee:
            Hari Shreedharan
            Reporter:
            Hari Shreedharan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development