      There is a race condition in RemoteSparkJobMonitor. Sometimes the info in RemoteSparkJobMonitor#startMonitor.STARTED gets printed out, sometimes it doesn't. This can be easily verified by running a qtest on TestMiniSparkOnYarnCliDriver and counting the number of times Query Hive on Spark job is printed vs. the number of times Finished successfully in gets printed.

      The issue is that RemoteSparkJobMonitor runs every one second, and checks the state of JobHandle. Depending on the state, it prints out some logging info. The content of the logs contain an implicit assumption that logs in the STARTED state are printed before the logs in the SUCCEEDED state. However, this isn't always the case. The state transitions are driven by how long the remote Spark job takes to run, and it it finishes within one second then the logs in the STARTED state never printed.

      This can be confusing to users, and there is key debugging information that is printed in the STARTED state.


