This is due to a timing issue. The test sets a number of configs to 1.5 second intervals, including yarn.am.liveness-monitor.expiry-interval-ms. And when the expired event happens in RMAppAttemptImpl, it removes the app attempt from the cache; then if the ApplicationMasterService tries to read it from the cache afterwards, it can't find it and you get the error.
I'm open to ideas on how to remove the timing element to this test, but for now I've upped the numbers to make it more reliable. In my testing, the original values could only accommodate a 1 second delay in ApplicationMasterService#allocate, but with my changes, it can accommodate a 4 second delay. This makes the test much more reliable.