Currently, Falcon retries only on failure. We should extend support in case of timed-out instances too. Earlier, since we were relying on post-processing to notify the instance status, this was not possible. Now that Falcon relies on Oozie JMS notifications, we can support retries for timed out instances too.
If a dataset is expected to get delayed for a long time, the user is currently forced to supply a large timeout value. This is an overhead in terms of Oozie having to poll for that long. If we introduce retries, the timeout can be a reasonable value with periodic/exponential back-off retries.