Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
When a log aggregation fails on the NM the information is for the attempt is kept in the recovery DB. Log aggregation can fail for multiple reasons which are often related to HDFS space or permissions.
On restart the recovery DB is read and if an application attempt needs its logs aggregated, the files are scheduled for aggregation without any checks. The log files could be older than the retention limit in which case we should not aggregate them but immediately mark them for deletion from the local file system.