The gist of this issue is regarding the use of Configuration.getClass() and the use of the thread context classloader (TCCL). Currently MRApps.setJobClassLoader() sets both the configuration classloader and the TCCL at the same time. So once setJobClassLoader() is called, it is made available in both contexts.
MAPREDUCE-5751 was caused because the job classloader was made available too early as the TCCL. This issue is caused because the job classloader is made available too late as the configuration classloader.
The normal classloading scheme (one class initializing another class via normal use or even Class.forName) is unaffected by this if my understanding is correct.
I see two possible approaches for this:
(1) separate the timing of setting the job classloader as the configuration classloader and the TCCL
I think while setting the TCCL should be delayed as much as possible (i.e. the current timing), the job classloader can be installed as the configuration classloader much earlier. If the configuration loads a user class, that's precisely what we need. If it loads a system class, the job classloader will delegate anyhow. I don't think there is harm in setting the configuration classloader early.
(2) set and unset the job classloader around the code that loads classes from the configuration
Identify the code points in MRAppMaster where Configuration.getClass() is needed, and set and unset the job classloader around them. Although this would also solve this problem, the downside is that one needs to make a determination that the job classloader is needed and set/unset it. This is potentially brittle.
I think (1) is a more robust solution to this problem. Do you see an issue with taking that approach?
I don't think the task (YarnChild) is affected by this.