Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.20.205.0, 0.21.0
-
None
-
Reviewed
-
Description
Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again.
Attachments
Attachments
Issue Links
- breaks
-
MAPREDUCE-2573 New findbugs warning after MAPREDUCE-2494
- Closed
-
MAPREDUCE-4576 Large dist cache can block tasktracker heartbeat
- Closed