Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2479

Backport MAPREDUCE-1568 to hadoop security branch

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.204.0
    • Fix Version/s: 0.20.204.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added mapreduce.tasktracker.distributedcache.checkperiod to the task tracker that defined the period to wait while cleaning up the distributed cache. The default is 1 min.
    1. MAPREDUCE-2479-v2.patch
      18 kB
      Robert Joseph Evans
    2. MAPREDUCE-2479-v1.patch
      18 kB
      Robert Joseph Evans

      Activity

      Hide
      Robert Joseph Evans added a comment -

      This is a simple port of MAPREDUCE-1568 with the differences that TTConfig.java does not exist so TT_DISTRIBUTED_CACHE_CHECK_PERIOD was inlined and the last part of TrackerDistributedCacheManager.BaseDirManager.checkAndClenup() where the files are actually deleted was replaced with the equivalent code from compactCache

      Show
      Robert Joseph Evans added a comment - This is a simple port of MAPREDUCE-1568 with the differences that TTConfig.java does not exist so TT_DISTRIBUTED_CACHE_CHECK_PERIOD was inlined and the last part of TrackerDistributedCacheManager.BaseDirManager.checkAndClenup() where the files are actually deleted was replaced with the equivalent code from compactCache
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12478820/MAPREDUCE-2479-v1.patch
      against trunk revision 1101741.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      -1 patch. The patch command could not apply the patch.

      Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/231//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12478820/MAPREDUCE-2479-v1.patch against trunk revision 1101741. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/231//console This message is automatically generated.
      Hide
      Robert Joseph Evans added a comment -

      The above is for trunk, and does not count, because this is targeted for 0.20.205.0
      In my environment I got

      [exec] -1 overall.
      [exec]
      [exec] +1 @author. The patch does not contain any @author tags.
      [exec]
      [exec] +1 tests included. The patch appears to include 3 new or modified tests.
      [exec]
      [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
      [exec]
      [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
      [exec]
      [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
      [exec]
      [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories.

      The -1 for Eclipse is a known issues and fixed with HADOOP-7248 The -1 on the javadocs is also wrong. There were 6 warnings before the patch and 6 after it.

      Show
      Robert Joseph Evans added a comment - The above is for trunk, and does not count, because this is targeted for 0.20.205.0 In my environment I got [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] -1 javadoc. The javadoc tool appears to have generated 1 warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to differ from the contents of the lib directories. The -1 for Eclipse is a known issues and fixed with HADOOP-7248 The -1 on the javadocs is also wrong. There were 6 warnings before the patch and 6 after it.
      Hide
      Robert Joseph Evans added a comment -

      I back ported the patch as is, but there are a few issues with the patch that I think should be addressed, but most likely in a separate JIRA so that trunk and security stay in sync.

      The background thread will clean a directory once it goes over the size limit, and then it will clean out all entires in the directory that are not currently being used. It would seem more logical to have clean only a subset of the data in an LRU manor with a goal of having the cache only X% full where X is configurable. Also the background thread is not monitored in any way. Along with this each individual job no longer validates that the cache has not grown too large so if the thread dies for any reason we may fill up all disks on the node with distributed cache. Not that it is likely to happen, but good defensive programming would dictate that we at least monitor the thread and restart it if it has failed.

      Show
      Robert Joseph Evans added a comment - I back ported the patch as is, but there are a few issues with the patch that I think should be addressed, but most likely in a separate JIRA so that trunk and security stay in sync. The background thread will clean a directory once it goes over the size limit, and then it will clean out all entires in the directory that are not currently being used. It would seem more logical to have clean only a subset of the data in an LRU manor with a goal of having the cache only X% full where X is configurable. Also the background thread is not monitored in any way. Along with this each individual job no longer validates that the cache has not grown too large so if the thread dies for any reason we may fill up all disks on the node with distributed cache. Not that it is likely to happen, but good defensive programming would dictate that we at least monitor the thread and restart it if it has failed.
      Hide
      Robert Joseph Evans added a comment -

      FILES MAPREDUCE-2495 and MAPREDUCE-2494 to address these issues.

      Show
      Robert Joseph Evans added a comment - FILES MAPREDUCE-2495 and MAPREDUCE-2494 to address these issues.
      Hide
      Chris Douglas added a comment -

      This looks like a clean backport of MAPREDUCE-1568.

      The only point worth addressing in this patch is the synchronization on refcount is inconsistent. Sometimes only the lock on cachedArchives is held, at other points only the lock on the CacheStatus instance is held. I found no cases of inconsistent lock ordering (no deadlocks), but it appears possible to get stale values for refcount. Since it's only incremented while holding the lock on cachedArchives, one can only get false negatives for adding entries to the deletion queue (i.e. one may hold onto a cached entry longer than necessary, but one will never delete an entry in use by an initializing/running task). It's not a bad bug, but it's probably worth fixing.

      +1 otherwise, particularly with the pending work in MAPREDUCE-2495 and MAPREDUCE-2494.

      Show
      Chris Douglas added a comment - This looks like a clean backport of MAPREDUCE-1568 . The only point worth addressing in this patch is the synchronization on refcount is inconsistent. Sometimes only the lock on cachedArchives is held, at other points only the lock on the CacheStatus instance is held. I found no cases of inconsistent lock ordering (no deadlocks), but it appears possible to get stale values for refcount . Since it's only incremented while holding the lock on cachedArchives , one can only get false negatives for adding entries to the deletion queue (i.e. one may hold onto a cached entry longer than necessary, but one will never delete an entry in use by an initializing/running task). It's not a bad bug, but it's probably worth fixing. +1 otherwise, particularly with the pending work in MAPREDUCE-2495 and MAPREDUCE-2494 .
      Hide
      Robert Joseph Evans added a comment -

      Working on the issues stated previously

      Show
      Robert Joseph Evans added a comment - Working on the issues stated previously
      Hide
      Robert Joseph Evans added a comment -

      Update to address potential locking issue

      Show
      Robert Joseph Evans added a comment - Update to address potential locking issue
      Hide
      Chris Douglas added a comment -

      I committed this to the 20-security branch. Thanks!

      Show
      Chris Douglas added a comment - I committed this to the 20-security branch. Thanks!
      Hide
      Owen O'Malley added a comment -

      Hadoop 0.20.204.0 was just released.

      Show
      Owen O'Malley added a comment - Hadoop 0.20.204.0 was just released.

        People

        • Assignee:
          Robert Joseph Evans
          Reporter:
          Robert Joseph Evans
        • Votes:
          0 Vote for this issue
          Watchers:
          3 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development