Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Currently status of a Samza job is determined by a combination of:
1. Obtaining YARN's status for the job by querying the RM
2. Obtain the AM/coordinator URL for the job
3. If (1) is "Running", Query the job's coordinator URL if all containers have started
YARN may restart the coordinator between (2) and (3) and the old coordinator process may no longer be alive, triggering a ConnectException in (3). This causes the status-call to fail;
A better alternative to handle these retriable errors is to return a "New" status from the API - so that applications can keep polling.