Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1914

TrackerDistributedCacheManager never cleans its input directories

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      When we localize a file into a node's cache, it's installed in a directory whose subroot is a random long . These long s all sit in a single flat directory [per disk, per cluster node]. When the cached file is no longer needed, its reference count becomes zero in a tracking data structure. The file then becomes eligible for deletion when the total amount of space occupied by cached files exceeds 10G [by default] or the total number of such files exceeds 10K.

      However, when we delete a cached file, we don't delete the directory that contains it; this importantly includes the elements of the flat directory, which then accumulate until they reach a system limit, 32K in some cases, and then the node stops working.

      We need to delete the flat directory when we delete the localized cache file it contains.

      Attachments

        Issue Links

          Activity

            People

              dking Dick King
              dking Dick King
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: