When we localize a file into a node's cache, it's installed in a directory whose subroot is a random long . These long s all sit in a single flat directory [per disk, per cluster node]. When the cached file is no longer needed, its reference count becomes zero in a tracking data structure. The file then becomes eligible for deletion when the total amount of space occupied by cached files exceeds 10G [by default] or the total number of such files exceeds 10K.
However, when we delete a cached file, we don't delete the directory that contains it; this importantly includes the elements of the flat directory, which then accumulate until they reach a system limit, 32K in some cases, and then the node stops working.
We need to delete the flat directory when we delete the localized cache file it contains.