Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Abandoned
-
0.10.0, 0.9.5
-
None
-
Alpine Linux 3.2, openjdk-7
-
Important
Description
ShellSpout class (and it seems that this touches ShellBolt as well) doesn't restart when hearbeat timeout occurs. To reproduce this bug you should do the following:
1. Set supervisor.worker.timeout.secs property to e.g. 1;
2. Create a shell spout (as a standalone application, not Java class) that hangs for more than 1 second and doesn't respond on heartbeat messages, e.g. Thread.Sleep(5000);
3. After timeout Storm will try to kill the shell spout process with calling die function:
https://github.com/apache/storm/blob/v0.10.0-beta/storm-core/src/jvm/backtype/storm/spout/ShellSpout.java#L237
4. The "die" function will call heartBeatExecutorService.shutdownNow() function that raises InterruptedException, which is not caughted by the calling thread. In a result topology stops working properly, however you may see it in ./storm list.
I'm not Java developer and thus I'm not sure whether code below is valid, however it seems to fix the problem:
private void die(Throwable exception) {
heartBeatExecutorService.shutdownNow();
try
catch (InterruptedException e)
{ LOG.error("await catch ", e); } _collector.reportError(exception);
_process.destroy();
System.exit(11);
}