The NM waits not only for the container to complete but for the entire application to complete – see
YARN-219. Holding long-lived leases on many files in HDFS puts a lot of load on the namenode.
It also cannot append "on the fly" since all the logs for all containers for an application on the node are in a single file in HDFS with the data for each log being contiguous within that file. Adding the ability to append to multiple log streams simultaneously is not possible in the current aggregated log format.
It would be nice to have some mechanism to get the NM to clean up logs, as currently each time the NM restarts log files are being leaked. This has been fixed for container local directories and the distributed cache via
YARN-71, but logs have been ignored. Seems like we should be consistent about these two. If the application is still running, isn't YARN-71 already deleting the app's current working directory and distcache files out from underneath it?