Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.1.0, 1.2.0
-
None
-
None
Description
Clean shutdown of a flume agent where a component fails to start doesn't work.
One way of confirming this is to try and use a FileChannel without hadoop IO jars on the classpath.
My understanding of this is that the first Ctrl+C will try to stop the supervisor, which in turn should take down everything, but AbstractFileConfigurationProvider#stop will try to gently stop the local executor, which in turn is in an endless loop trying to start up a channel(DefaultLogicalNodeManager#startAllComponents). This loop can only be broken by an interrupt, but none ever comes to it, or the higher level(which would try shutdownNow on the executors.
The interrupts will never come, since they are all relying on something further up the chain for them, but it doesn't exist.
One solution for this is just to be less merciful in AbstractFileConfiguration#stop() and give it a moderate timeout, then do executorService.shutdownNow()
Attachments
Issue Links
- is related to
-
FLUME-966 Migrate lifecycle management to Guava service implemetation
-
- Closed
-