The two aspects of making the job-history cleanup period configurable seem to be:
- unit of time to configure it on (minutes vs days)
- directory naming to reflect the creation time
For the directory naming, we should be able to leave it as yyyy/mm/dd format. If the cleanup period is in hours/minutes, we can read the file modification time and use that to do the comparison. Trunk currently uses the modification time even for higher values of cleanup period (days) and has a TODO note that we should use the directory structure instead to reduce the load on HDFS.
With regards to the unit of time for configuration, minutes seems to be the better option.