Details
Description
Oozie actions may fail for various transient and non-transient problems arising in Hadoop, Network etc.
Falcon handles user-workflow action failures by sending a JMS notification to ActiveMQ (post-processing action in parent-wf), this message is used to provide automatic retries based on various retry policies defined in Process.xml. However, post-processing action of Falcon itself might fail and workflow may remain in killed/suspended state.
Oozie provides OOTB retires that can be configured per action, we should leverage this to provide Falcon's parent workflow action reruns.