Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
In TrackerDistributedCachaManager.java , the portion where the cache is cleaned up, the lock is taken on the main hash table and then all the entries are scanned to see if they can be deleted. That's a long lockage. The table is likely to have 10K entries.
I would like to reduce the longest lock duration by maintaining the set of CacheStatus es to delete incrementally.
1: Let there be a new HashSet, deleteSet, that's protected under synchronized(cachedArchives)
2: When refcount is decreased to 0, move the CacheStatus from cachedArchives to deleteSet
3: When seeking an existing CacheStatus, look in deleteSet if it isn't in cachedArchives
4: When refcount is increased from 0 to 1 in a pre-existing CacheStatus [see 3:, above] move the CacheStatus from deleteSet to cachedArchives
5: When we clean the cache, under synchronized(cachedArchives) , move deleteSet to a local variable and create a new empty HashSet. This is constant time.
Attachments
Issue Links
- relates to
-
MAPREDUCE-1914 TrackerDistributedCacheManager never cleans its input directories
- Resolved