Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-1257

Components that fail to start can put flume into a state which it can't shutdown from

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.0, 1.2.0
    • None
    • Configuration, Node
    • None

    Description

      Clean shutdown of a flume agent where a component fails to start doesn't work.

      One way of confirming this is to try and use a FileChannel without hadoop IO jars on the classpath.

      My understanding of this is that the first Ctrl+C will try to stop the supervisor, which in turn should take down everything, but AbstractFileConfigurationProvider#stop will try to gently stop the local executor, which in turn is in an endless loop trying to start up a channel(DefaultLogicalNodeManager#startAllComponents). This loop can only be broken by an interrupt, but none ever comes to it, or the higher level(which would try shutdownNow on the executors.

      The interrupts will never come, since they are all relying on something further up the chain for them, but it doesn't exist.

      One solution for this is just to be less merciful in AbstractFileConfiguration#stop() and give it a moderate timeout, then do executorService.shutdownNow()

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              juhanic Juhani Connolly
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: