When storm stars a large number of ShellBolt-s that consume a lot of CPU time to initialize, it creates a lot of contention between processes for CPU resource. That leads to BoltHeartbeatTimerTask being fired up after 1 second delay before setHeartbeat() assigns initial value to lastHeartbeatTimestamp variable.
As a result when BoltHeartbeatTimeTask fires up for the first time, getLastHeartbeat() returns value of 0. This in turn leads bolt to die with "subprocess heartbeat timeout" message.
The fix is to place setHeartBeat() before BoltHeartbeatTimerTask is created. The patch for this is attached.