Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5767

Fix the order that resources are cleaned up from the local Public/Private caches

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0, 2.7.0, 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 3.0.0-alpha2
    • Component/s: None
    • Labels:
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This issue fixes a bug in how resources are evicted from the PUBLIC and PRIVATE yarn local caches used by the node manager for resource localization. In summary, the caches are now properly cleaned based on an LRU policy across both the public and private caches.
      Show
      This issue fixes a bug in how resources are evicted from the PUBLIC and PRIVATE yarn local caches used by the node manager for resource localization. In summary, the caches are now properly cleaned based on an LRU policy across both the public and private caches.

      Description

      If you look at ResourceLocalizationService#handleCacheCleanup, you can see that public resources are added to the ResourceRetentionSet first followed by private resources:

      private void handleCacheCleanup(LocalizationEvent event) {
        ResourceRetentionSet retain =
          new ResourceRetentionSet(delService, cacheTargetSize);
        retain.addResources(publicRsrc);
        if (LOG.isDebugEnabled()) {
          LOG.debug("Resource cleanup (public) " + retain);
        }
        for (LocalResourcesTracker t : privateRsrc.values()) {
          retain.addResources(t);
          if (LOG.isDebugEnabled()) {
            LOG.debug("Resource cleanup " + t.getUser() + ":" + retain);
          }
        }
        //TODO Check if appRsrcs should also be added to the retention set.
      }
      

      Unfortunately, if we look at ResourceRetentionSet#addResources we see that this means public resources are deleted first until the target cache size is met:

      public void addResources(LocalResourcesTracker newTracker) {
        for (LocalizedResource resource : newTracker) {
          currentSize += resource.getSize();
          if (resource.getRefCount() > 0) {
            // always retain resources in use
            continue;
          }
          retain.put(resource, newTracker);
        }
        for (Iterator<Map.Entry<LocalizedResource,LocalResourcesTracker>> i =
               retain.entrySet().iterator();
             currentSize - delSize > targetSize && i.hasNext();) {
          Map.Entry<LocalizedResource,LocalResourcesTracker> rsrc = i.next();
          LocalizedResource resource = rsrc.getKey();
          LocalResourcesTracker tracker = rsrc.getValue();
          if (tracker.remove(resource, delService)) {
            delSize += resource.getSize();
            i.remove();
          }
        }
      }
      

      The result of this is that resources in the private cache are only deleted in the cases where:

      1. The cache size is larger than the target cache size and the public cache is empty.
      2. The cache size is larger than the target cache size and everything in the public cache is being used by a running container.

      For clusters that primarily use the public cache (i.e. make use of the shared cache), this means that the most commonly used resources can be deleted before old resources in the private cache. Furthermore, the private cache can continue to grow over time causing more and more churn in the public cache.

      Additionally, the same problem exists within the private cache. Since resources are added to the retention set on a user by user basis, resources will get cleaned up one user at a time in the order that privateRsrc.values() returns the LocalResourcesTracker. So if user1 has 10MB in their cache and user2 has 100MB in their cache and the target size of the cache is 50MB, user1 could potentially have their entire cache removed before anything is deleted from the user2 cache.

      1. YARN-5767-trunk-v1.patch
        30 kB
        Chris Trezzo
      2. YARN-5767-trunk-v2.patch
        31 kB
        Chris Trezzo
      3. YARN-5767-trunk-v3.patch
        31 kB
        Chris Trezzo
      4. YARN-5767-trunk-v4.patch
        32 kB
        Chris Trezzo

        Issue Links

          Activity

          Hide
          ctrezzo Chris Trezzo added a comment -

          I see two initial high-level approaches to this problem:

          1. Add all resources from the public and private cache to the retention set at the same time. Then resources across all caches will be removed using an LRU eviction strategy.
          2. Use separate retention sets for the public and private caches. This way, an admin can set separate retention sizes for the public and private cache. Furthermore, the caches will be isolated from each other and a large file being localized in a user cache will not cause churn in the public cache or visa versa.

          Currently I am leaning towards #2. Please let me know your thoughts! Thanks.

          Show
          ctrezzo Chris Trezzo added a comment - I see two initial high-level approaches to this problem: Add all resources from the public and private cache to the retention set at the same time. Then resources across all caches will be removed using an LRU eviction strategy. Use separate retention sets for the public and private caches. This way, an admin can set separate retention sizes for the public and private cache. Furthermore, the caches will be isolated from each other and a large file being localized in a user cache will not cause churn in the public cache or visa versa. Currently I am leaning towards #2. Please let me know your thoughts! Thanks.
          Hide
          ctrezzo Chris Trezzo added a comment -

          Attached is a v1 patch for trunk.

          In this initial patch I actually went with approach #1.

          Here is a summary of the modifications this patch makes:

          1. Renamed the ResourceRetentionSet class to LocalCacheCleaner. In the patch it looks like a delete/add.
          2. Modified LocalCacheCleaner#addResources so that it only adds resources to the map and does not clean.
          3. Added a new method LocalCacheCleaner#cleanCache that is actually responsible for cleaning the cache. The general intention is that you would add a bunch of resources to the cleaner, and then call clean. All resources that the cleaner is aware of at that point will get cleaned up on an LRU basis.
          4. Added a new stats class to LocalCacheCleaner that keeps track of the same stats ResourceRetentionSet did, plus an optional more detailed breakdown of what was cleaned from private caches.
          5. Added a new test class TestLocalCacheCleanup. This tests a basic cleanup, a cleanup where there are resources with positive ref counts, tests that the cleaner is indeed using an LRU policy across both private and public caches, and finally tests that the cleanup stats are correct.
          6. Deleted the TestRetentionSet class because it is now redundant with TestLocalCacheCleanup.

          Please let me know your thoughts! If there is too much going on in the patch, I can always break it down into smaller ones. Thanks.

          /cc Jason Lowe Sangjin Lee

          Show
          ctrezzo Chris Trezzo added a comment - Attached is a v1 patch for trunk. In this initial patch I actually went with approach #1. Here is a summary of the modifications this patch makes: Renamed the ResourceRetentionSet class to LocalCacheCleaner . In the patch it looks like a delete/add. Modified LocalCacheCleaner#addResources so that it only adds resources to the map and does not clean. Added a new method LocalCacheCleaner#cleanCache that is actually responsible for cleaning the cache. The general intention is that you would add a bunch of resources to the cleaner, and then call clean. All resources that the cleaner is aware of at that point will get cleaned up on an LRU basis. Added a new stats class to LocalCacheCleaner that keeps track of the same stats ResourceRetentionSet did, plus an optional more detailed breakdown of what was cleaned from private caches. Added a new test class TestLocalCacheCleanup . This tests a basic cleanup, a cleanup where there are resources with positive ref counts, tests that the cleaner is indeed using an LRU policy across both private and public caches, and finally tests that the cleanup stats are correct. Deleted the TestRetentionSet class because it is now redundant with TestLocalCacheCleanup . Please let me know your thoughts! If there is too much going on in the patch, I can always break it down into smaller ones. Thanks. /cc Jason Lowe Sangjin Lee
          Hide
          ctrezzo Chris Trezzo added a comment -

          Submitting patch for a qa run.

          Show
          ctrezzo Chris Trezzo added a comment - Submitting patch for a qa run.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 19s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 41s trunk passed
          +1 compile 0m 27s trunk passed
          +1 checkstyle 0m 18s trunk passed
          +1 mvnsite 0m 26s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 0m 40s trunk passed
          +1 javadoc 0m 17s trunk passed
          +1 mvninstall 0m 22s the patch passed
          +1 compile 0m 24s the patch passed
          +1 javac 0m 24s the patch passed
          -1 checkstyle 0m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 6 new + 160 unchanged - 27 fixed = 166 total (was 187)
          +1 mvnsite 0m 24s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          -1 findbugs 0m 47s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          -1 javadoc 0m 14s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 2 new + 237 unchanged - 0 fixed = 239 total (was 237)
          +1 unit 15m 3s hadoop-yarn-server-nodemanager in the patch passed.
          +1 asflicense 0m 15s The patch does not generate ASF License warnings.
          27m 55s



          Reason Tests
          FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
            org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalCacheCleaner$LRUComparator implements Comparator but not Serializable At LocalCacheCleaner.java:Serializable At LocalCacheCleaner.java:[lines 159-168]



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834856/YARN-5767-trunk-v1.patch
          JIRA Issue YARN-5767
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5903daa0a638 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d0a3479
          Default Java 1.8.0_101
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13483/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
          findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13483/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.html
          javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13483/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13483/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13483/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 19s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 41s trunk passed +1 compile 0m 27s trunk passed +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 26s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 40s trunk passed +1 javadoc 0m 17s trunk passed +1 mvninstall 0m 22s the patch passed +1 compile 0m 24s the patch passed +1 javac 0m 24s the patch passed -1 checkstyle 0m 15s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 6 new + 160 unchanged - 27 fixed = 166 total (was 187) +1 mvnsite 0m 24s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. -1 findbugs 0m 47s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) -1 javadoc 0m 14s hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 2 new + 237 unchanged - 0 fixed = 239 total (was 237) +1 unit 15m 3s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 15s The patch does not generate ASF License warnings. 27m 55s Reason Tests FindBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager   org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalCacheCleaner$LRUComparator implements Comparator but not Serializable At LocalCacheCleaner.java:Serializable At LocalCacheCleaner.java: [lines 159-168] Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12834856/YARN-5767-trunk-v1.patch JIRA Issue YARN-5767 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5903daa0a638 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d0a3479 Default Java 1.8.0_101 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-YARN-Build/13483/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt findbugs https://builds.apache.org/job/PreCommit-YARN-Build/13483/artifact/patchprocess/new-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.html javadoc https://builds.apache.org/job/PreCommit-YARN-Build/13483/artifact/patchprocess/diff-javadoc-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13483/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13483/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          ctrezzo Chris Trezzo added a comment -

          Attaching a v2 patch for trunk. This new version simply fixes checkstyles and findbugs. Here is a summary:

          1. Add javadoc comments and fix spacing.
          2. Add a hashcode method and serializable interface to LocalCacheCleaner#LRUComparator.
          Show
          ctrezzo Chris Trezzo added a comment - Attaching a v2 patch for trunk. This new version simply fixes checkstyles and findbugs. Here is a summary: Add javadoc comments and fix spacing. Add a hashcode method and serializable interface to LocalCacheCleaner#LRUComparator .
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 10s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 6m 54s trunk passed
          +1 compile 0m 25s trunk passed
          +1 checkstyle 0m 18s trunk passed
          +1 mvnsite 0m 27s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 0m 40s trunk passed
          +1 javadoc 0m 17s trunk passed
          +1 mvninstall 0m 21s the patch passed
          +1 compile 0m 23s the patch passed
          +1 javac 0m 23s the patch passed
          +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 160 unchanged - 27 fixed = 160 total (was 187)
          +1 mvnsite 0m 23s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix.
          +1 findbugs 0m 45s the patch passed
          +1 javadoc 0m 13s the patch passed
          +1 unit 14m 53s hadoop-yarn-server-nodemanager in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          27m 42s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835016/YARN-5767-trunk-v2.patch
          JIRA Issue YARN-5767
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 2f2b8c5e8486 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / a1a0281
          Default Java 1.8.0_101
          findbugs v3.0.0
          whitespace https://builds.apache.org/job/PreCommit-YARN-Build/13492/artifact/patchprocess/whitespace-eol.txt
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13492/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13492/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 10s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 6m 54s trunk passed +1 compile 0m 25s trunk passed +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 27s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 40s trunk passed +1 javadoc 0m 17s trunk passed +1 mvninstall 0m 21s the patch passed +1 compile 0m 23s the patch passed +1 javac 0m 23s the patch passed +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 160 unchanged - 27 fixed = 160 total (was 187) +1 mvnsite 0m 23s the patch passed +1 mvneclipse 0m 10s the patch passed -1 whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 findbugs 0m 45s the patch passed +1 javadoc 0m 13s the patch passed +1 unit 14m 53s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 27m 42s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835016/YARN-5767-trunk-v2.patch JIRA Issue YARN-5767 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2f2b8c5e8486 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / a1a0281 Default Java 1.8.0_101 findbugs v3.0.0 whitespace https://builds.apache.org/job/PreCommit-YARN-Build/13492/artifact/patchprocess/whitespace-eol.txt Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13492/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13492/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          ctrezzo Chris Trezzo added a comment -

          V3 attached. Fixed whitespace.

          Show
          ctrezzo Chris Trezzo added a comment - V3 attached. Fixed whitespace.
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 16s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 8m 3s trunk passed
          +1 compile 0m 33s trunk passed
          +1 checkstyle 0m 19s trunk passed
          +1 mvnsite 0m 31s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 0m 48s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 26s the patch passed
          +1 compile 0m 29s the patch passed
          +1 javac 0m 29s the patch passed
          +1 checkstyle 0m 17s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 159 unchanged - 27 fixed = 159 total (was 186)
          +1 mvnsite 0m 29s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 55s the patch passed
          +1 javadoc 0m 17s the patch passed
          +1 unit 14m 58s hadoop-yarn-server-nodemanager in the patch passed.
          +1 asflicense 0m 17s The patch does not generate ASF License warnings.
          30m 4s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835025/YARN-5767-trunk-v3.patch
          JIRA Issue YARN-5767
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 815705b05614 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 9d17585
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13493/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13493/console
          Powered by Apache Yetus 0.3.0 http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 16s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 8m 3s trunk passed +1 compile 0m 33s trunk passed +1 checkstyle 0m 19s trunk passed +1 mvnsite 0m 31s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 0m 48s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 26s the patch passed +1 compile 0m 29s the patch passed +1 javac 0m 29s the patch passed +1 checkstyle 0m 17s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 159 unchanged - 27 fixed = 159 total (was 186) +1 mvnsite 0m 29s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 55s the patch passed +1 javadoc 0m 17s the patch passed +1 unit 14m 58s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 30m 4s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835025/YARN-5767-trunk-v3.patch JIRA Issue YARN-5767 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 815705b05614 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9d17585 Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13493/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13493/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
          Hide
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment -

          +1 (non-binding) Thank you, Chris Trezzo!

          I have a few optional comments:

          testLRUAcrossTrackers:
          The test does not check whether the right resource was deleted from each list.
          You might want to use resources.getLocalRsrc().containsKey here just like in testPositiveRefCount.

          LocalCacheCleanerStats:
          It would be useful in the future for debugging, if toStringDetailed() printed out the actual resource paths not just the size per user

          Show
          miklos.szegedi@cloudera.com Miklos Szegedi added a comment - +1 (non-binding) Thank you, Chris Trezzo ! I have a few optional comments: testLRUAcrossTrackers: The test does not check whether the right resource was deleted from each list. You might want to use resources.getLocalRsrc().containsKey here just like in testPositiveRefCount. LocalCacheCleanerStats: It would be useful in the future for debugging, if toStringDetailed() printed out the actual resource paths not just the size per user
          Hide
          jlowe Jason Lowe added a comment -

          Thanks for the patch, Chris! Looks good overall, just some minor nits in addition to the unit test comment above. I don't think the actual resource path logging is necessary since LocalResourcesTrackerImpl already does this in its remove method. If this does get added here, please make it a trace-level log separate from the debug one.

          LocalCacheCleaner should be a package-private class. It does not need to be public.

          handleCacheCleanup takes an event that isn't used and should be removed. Looks like this was in the original code as well, but it would be nice to cleanup while we're here.

          Why does LRUComparator implement Serializable? It doesn't really do it properly and appears to be unnecessary.

          LocalCacheCleanerStats should take the currentSize as a constructor argument which can be used to initialize cacheSizeBeforeClean. As a bonus, cacheSizeBeforeClean can then be marked final.

          I'm curious why we're removing the map entry here:

                if (tracker.remove(resource, delService)) {
                  stats.incDelSize(tracker.getUser(), resource.getSize());
                  i.remove();
                }
          

          The removal seems unnecessary. Is the concern that someone will create a long-term cache cleaner object that will hold onto these LocalizedResource objects? If so then it would be simpler and faster to just clear the entire resourceMap at the end of the method, since we don't really support reusing a cleaner object.

          Show
          jlowe Jason Lowe added a comment - Thanks for the patch, Chris! Looks good overall, just some minor nits in addition to the unit test comment above. I don't think the actual resource path logging is necessary since LocalResourcesTrackerImpl already does this in its remove method. If this does get added here, please make it a trace-level log separate from the debug one. LocalCacheCleaner should be a package-private class. It does not need to be public. handleCacheCleanup takes an event that isn't used and should be removed. Looks like this was in the original code as well, but it would be nice to cleanup while we're here. Why does LRUComparator implement Serializable? It doesn't really do it properly and appears to be unnecessary. LocalCacheCleanerStats should take the currentSize as a constructor argument which can be used to initialize cacheSizeBeforeClean. As a bonus, cacheSizeBeforeClean can then be marked final. I'm curious why we're removing the map entry here: if (tracker.remove(resource, delService)) { stats.incDelSize(tracker.getUser(), resource.getSize()); i.remove(); } The removal seems unnecessary. Is the concern that someone will create a long-term cache cleaner object that will hold onto these LocalizedResource objects? If so then it would be simpler and faster to just clear the entire resourceMap at the end of the method, since we don't really support reusing a cleaner object.
          Hide
          ctrezzo Chris Trezzo added a comment -

          Thanks for the reviews Miklos Szegedi and Jason Lowe!

          You might want to use resources.getLocalRsrc().containsKey here just like in testPositiveRefCount.

          Miklos Szegedi Ack, I will add the check to testLRUAcrossTrackers.

          Why does LRUComparator implement Serializable? It doesn't really do it properly and appears to be unnecessary.

          I agree that it is unnecessary since we never serialize LRUComparator or the map that we add it to. This is a result of the old code and our findbugs check. There were a couple of modifications I made to the LRUComparator class to satisfy findbugs:

          1. The comparator interface forces you to implement the equals method (how the original code was implemented), but does not force you to add a hashcode implementation. Findbugs now flags this. The original implementation of the equals method was implemented incorrectly (compares object references), but I left it because we never compare comparators in the code. To appease findbugs, I added an identity hashcode implementation that paired with the original implementation of the equals method. In hindsight, maybe I should just fix the original implementation of the equals method and add a correct hashcode implementation as well. Jason Lowe, it seems a little unnecessary because our code never uses it, so I would be curious to hear your thoughts.
          2. Findbugs flags the original LRUComparator for implementing the comparator interface, but not implementing Serializable. The argument is that if the comparator is not serializable and it is added to a map, then the map is not serializable. The LRUComparator has no member variables, so I figured all I had to do was declare it serializable and add a serial version number. Jason Lowe, should I add default write/read object methods as well? Thanks!

          Is the concern that someone will create a long-term cache cleaner object that will hold onto these LocalizedResource objects? If so then it would be simpler and faster to just clear the entire resourceMap at the end of the method, since we don't really support reusing a cleaner object.

          Maybe? This was just copied from the original ResourceRetentionSet class. I can delete the remove call and clear the entire resourceMap at the end of cleanCache.

          I will also address the other comments. Thanks again for the reviews!

          Show
          ctrezzo Chris Trezzo added a comment - Thanks for the reviews Miklos Szegedi and Jason Lowe ! You might want to use resources.getLocalRsrc().containsKey here just like in testPositiveRefCount. Miklos Szegedi Ack, I will add the check to testLRUAcrossTrackers. Why does LRUComparator implement Serializable? It doesn't really do it properly and appears to be unnecessary. I agree that it is unnecessary since we never serialize LRUComparator or the map that we add it to. This is a result of the old code and our findbugs check. There were a couple of modifications I made to the LRUComparator class to satisfy findbugs: The comparator interface forces you to implement the equals method (how the original code was implemented), but does not force you to add a hashcode implementation. Findbugs now flags this. The original implementation of the equals method was implemented incorrectly (compares object references), but I left it because we never compare comparators in the code. To appease findbugs, I added an identity hashcode implementation that paired with the original implementation of the equals method. In hindsight, maybe I should just fix the original implementation of the equals method and add a correct hashcode implementation as well. Jason Lowe , it seems a little unnecessary because our code never uses it, so I would be curious to hear your thoughts. Findbugs flags the original LRUComparator for implementing the comparator interface, but not implementing Serializable. The argument is that if the comparator is not serializable and it is added to a map, then the map is not serializable. The LRUComparator has no member variables, so I figured all I had to do was declare it serializable and add a serial version number. Jason Lowe , should I add default write/read object methods as well? Thanks! Is the concern that someone will create a long-term cache cleaner object that will hold onto these LocalizedResource objects? If so then it would be simpler and faster to just clear the entire resourceMap at the end of the method, since we don't really support reusing a cleaner object. Maybe? This was just copied from the original ResourceRetentionSet class. I can delete the remove call and clear the entire resourceMap at the end of cleanCache . I will also address the other comments. Thanks again for the reviews!
          Hide
          jlowe Jason Lowe added a comment -

          Ah, I missed the findbugs connection with respect to comparator serialization. In that case I'm fine with leaving that in, since the alternative of adding a findbugs exception is about as much work. No need to do read/write object methods.

          Show
          jlowe Jason Lowe added a comment - Ah, I missed the findbugs connection with respect to comparator serialization. In that case I'm fine with leaving that in, since the alternative of adding a findbugs exception is about as much work. No need to do read/write object methods.
          Hide
          sjlee0 Sangjin Lee added a comment -

          Thanks for the patch Chris Trezzo! I agree with the above feedback from Jason and Mikios.

          Just a couple of more minor comments.

          1. Let's make LRUComparator a private static class. This doesn't need to be visible outside LocalCacheCleaner.
          2. Can we remove LocalCacheCleanerStats.getUserDelSizes()? This publishes the internal map and breaks the encapsulation. It doesn't appear to be used by any code.
          3. Please use a generated serialVersionUID value instead of a hard-coded value of 1.

          Show
          sjlee0 Sangjin Lee added a comment - Thanks for the patch Chris Trezzo ! I agree with the above feedback from Jason and Mikios. Just a couple of more minor comments. 1. Let's make LRUComparator a private static class. This doesn't need to be visible outside LocalCacheCleaner . 2. Can we remove LocalCacheCleanerStats.getUserDelSizes() ? This publishes the internal map and breaks the encapsulation. It doesn't appear to be used by any code. 3. Please use a generated serialVersionUID value instead of a hard-coded value of 1.
          Hide
          ctrezzo Chris Trezzo added a comment -

          Attached is a v4 patch for trunk addressing all comments from the reviews. Thanks!

          Show
          ctrezzo Chris Trezzo added a comment - Attached is a v4 patch for trunk addressing all comments from the reviews. Thanks!
          Hide
          ctrezzo Chris Trezzo added a comment -

          Two notes:

          1. Instead of removing LocalCacheCleanerStats.getUserDelSizes() I made it return an unmodifiable map. That way, users of the class still have access to the data (outside of the toString method) and it is still protected.
          2. I wound up removing both LRUComparator.equals and LRUComparator.hashCode. I figure we don't need to override them since the methods were just using the default implementation anyways.

          My intention is to file a followup jira that adds metrics that expose the statistics from LocalCacheCleanerStats.

          Show
          ctrezzo Chris Trezzo added a comment - Two notes: Instead of removing LocalCacheCleanerStats.getUserDelSizes() I made it return an unmodifiable map. That way, users of the class still have access to the data (outside of the toString method) and it is still protected. I wound up removing both LRUComparator.equals and LRUComparator.hashCode . I figure we don't need to override them since the methods were just using the default implementation anyways. My intention is to file a followup jira that adds metrics that expose the statistics from LocalCacheCleanerStats .
          Hide
          hadoopqa Hadoop QA added a comment -
          +1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 15s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 40s trunk passed
          +1 compile 0m 32s trunk passed
          +1 checkstyle 0m 18s trunk passed
          +1 mvnsite 0m 27s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 0m 51s trunk passed
          +1 javadoc 0m 20s trunk passed
          +1 mvninstall 0m 27s the patch passed
          +1 compile 0m 27s the patch passed
          +1 javac 0m 27s the patch passed
          +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 159 unchanged - 27 fixed = 159 total (was 186)
          +1 mvnsite 0m 27s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 0m 57s the patch passed
          +1 javadoc 0m 14s the patch passed
          +1 unit 15m 29s hadoop-yarn-server-nodemanager in the patch passed.
          +1 asflicense 0m 16s The patch does not generate ASF License warnings.
          30m 41s



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue YARN-5767
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835439/YARN-5767-trunk-v4.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 26badfc5aa93 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 22ff0ef
          Default Java 1.8.0_101
          findbugs v3.0.0
          Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13528/testReport/
          modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
          Console output https://builds.apache.org/job/PreCommit-YARN-Build/13528/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall Vote Subsystem Runtime Comment 0 reexec 0m 15s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 40s trunk passed +1 compile 0m 32s trunk passed +1 checkstyle 0m 18s trunk passed +1 mvnsite 0m 27s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 51s trunk passed +1 javadoc 0m 20s trunk passed +1 mvninstall 0m 27s the patch passed +1 compile 0m 27s the patch passed +1 javac 0m 27s the patch passed +1 checkstyle 0m 16s hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 159 unchanged - 27 fixed = 159 total (was 186) +1 mvnsite 0m 27s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 0m 57s the patch passed +1 javadoc 0m 14s the patch passed +1 unit 15m 29s hadoop-yarn-server-nodemanager in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 30m 41s Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue YARN-5767 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835439/YARN-5767-trunk-v4.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 26badfc5aa93 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 22ff0ef Default Java 1.8.0_101 findbugs v3.0.0 Test Results https://builds.apache.org/job/PreCommit-YARN-Build/13528/testReport/ modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager Console output https://builds.apache.org/job/PreCommit-YARN-Build/13528/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jlowe Jason Lowe added a comment -

          +1 lgtm. I'll commit this tomorrow if there are no objections.

          Show
          jlowe Jason Lowe added a comment - +1 lgtm. I'll commit this tomorrow if there are no objections.
          Hide
          sjlee0 Sangjin Lee added a comment -

          +1 too.

          Show
          sjlee0 Sangjin Lee added a comment - +1 too.
          Hide
          jlowe Jason Lowe added a comment -

          Thanks to Chris Trezzo for the contribution and to Miklos Szegedi and Sangjin Lee for additional review! I committed this to trunk, branch-2, and branch-2.8.

          Show
          jlowe Jason Lowe added a comment - Thanks to Chris Trezzo for the contribution and to Miklos Szegedi and Sangjin Lee for additional review! I committed this to trunk, branch-2, and branch-2.8.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10721 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10721/)
          YARN-5767. Fix the order that resources are cleaned up from the local (jlowe: rev 1b79c417dca17bcd2e031864bc6ca07254c61b47)

          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheCleaner.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
          • (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
          • (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheCleanup.java
          • (delete) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java
          • (delete) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceRetentionSet.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10721 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10721/ ) YARN-5767 . Fix the order that resources are cleaned up from the local (jlowe: rev 1b79c417dca17bcd2e031864bc6ca07254c61b47) (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalCacheCleaner.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheCleanup.java (delete) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceRetention.java (delete) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceRetentionSet.java
          Hide
          ctrezzo Chris Trezzo added a comment -

          Thanks Jason Lowe! I will add release notes to the issue as well.

          Show
          ctrezzo Chris Trezzo added a comment - Thanks Jason Lowe ! I will add release notes to the issue as well.
          Hide
          ctrezzo Chris Trezzo added a comment -

          I have filed a followup jira for the metrics: YARN-5797

          Show
          ctrezzo Chris Trezzo added a comment - I have filed a followup jira for the metrics: YARN-5797

            People

            • Assignee:
              ctrezzo Chris Trezzo
              Reporter:
              ctrezzo Chris Trezzo
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development