Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
When we localize a file into a node's cache, it's installed in a directory whose subroot is a random long . These long s all sit in a single flat directory [per disk, per cluster node]. When the cached file is no longer needed, its reference count becomes zero in a tracking data structure. The file then becomes eligible for deletion when the total amount of space occupied by cached files exceeds 10G [by default] or the total number of such files exceeds 10K.
However, when we delete a cached file, we don't delete the directory that contains it; this importantly includes the elements of the flat directory, which then accumulate until they reach a system limit, 32K in some cases, and then the node stops working.
We need to delete the flat directory when we delete the localized cache file it contains.
Attachments
Attachments
Issue Links
- is related to
-
MAPREDUCE-1538 TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit
- Closed
-
MAPREDUCE-1909 TrackerDistributedCacheManager takes a blocking lock fo a loop that executes 10K times
- Open