It appears that if an HDFS space quota is set on a target directory for log aggregation and the quota is already exceeded when log aggregation is attempted, zero-byte log files will be written to the HDFS directory, however NodeManager logs do not reflect a failure to write the files successfully (i.e. there are no ERROR or WARN messages to this effect).
An improvement may be worth investigating to alert users to this scenario, as otherwise logs for a YARN application may be missing both on HDFS and locally (after local log cleanup is done) and the user may not otherwise be informed.
Steps to reproduce:
- Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB)
- Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full
- Run a Spark or MR job in the cluster
- Observe that zero byte files are written to HDFS after job completion
- Observe that YARN container logs are also not present on the NM hosts (or are deleted after yarn.nodemanager.delete.debug-delay-sec)
- Observe that no ERROR or WARN messages appear to be logged in the NM role log