Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1914

TrackerDistributedCacheManager never cleans its input directories

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When we localize a file into a node's cache, it's installed in a directory whose subroot is a random long . These long s all sit in a single flat directory [per disk, per cluster node]. When the cached file is no longer needed, its reference count becomes zero in a tracking data structure. The file then becomes eligible for deletion when the total amount of space occupied by cached files exceeds 10G [by default] or the total number of such files exceeds 10K.

      However, when we delete a cached file, we don't delete the directory that contains it; this importantly includes the elements of the flat directory, which then accumulate until they reach a system limit, 32K in some cases, and then the node stops working.

      We need to delete the flat directory when we delete the localized cache file it contains.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dking Dick King
                Reporter:
                dking Dick King
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: