I'd like someone else to take a look at adding 'synchronized' to getTasks() in addition to me - it's just scary enough that I think an extra reviewer is prudent.
I tried to find some history: http://s.apache.org/T99 (from
MAPREDUCE-1316). The synchronization was intentionally left out, but only to be consistent with the existing code.
The synchronization around JIP is difficult to track. Its omission here appears to admit the possibility that a query of an initializing job could iterate over a partially constructed set of TIPs and cause NPEs (e.g. via the JSP), but I couldn't find a place where this is a serious problem. After the job is initialized, none of the references to member fields become stale. On the other hand, I looked through all the callers of getTasks(), and most are holding the JT lock already (mostly schedulers, during heartbeat processing) so I doubt that synchronizing on the JIP here would cause problems.
One could fix getTasks() to be threadsafe by changing create
Tasks and initSetupCleanupTasks to assign the initialized array of tasks to the member field. The current behavior constructs it in-place, which (AFAICT) means that a caller of getTasks() could read partially-constructed TIPs. It's probably better to make getTasks() synchronized, to avoid having getTasks() return partially constructed, inconsistent JIP state.