Description
Hive handles SessionNotRunning during submitDAG() and restarts the tez-session
if it receives one. In YHIVE-15, we did not receive that and the query failed. In some scenarios the Application will fall out of the RM's knowledge and a ApplicationNotFound exception is received instead.
Here are my asks.
1. TezClient.submitDAG()/stop() should return SessionNotRunning exception if
application is expired. Basically any API which currently returns
SessionNotRunning should handle the app-not-found scenario.
2. It would help if TezClient.getAppMasterStatus() can return
TezAppMasterStatus.SHUTDOWN if tez-session-application does not exist in RM.
That way, as a precaution, applications could check before submitting DAG's.
3. I think it might be better if verifySessionStateForSubmission() checks the
app Status every time instead of checking sessionStarted. I am not sure about
side-effects, but will leave that to your decision.
If 3 takes time, we can pursue that later. It would really help to get 1 & 2 in
the next tez release, especially for busy grids.