Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9549

TestCacheDirectives#testExceedsCapacity is flaky

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
    • Environment:

      Jenkins

    • Target Version/s:

      Description

      I have observed that this test (TestCacheDirectives.testExceedsCapacity) fails quite frequently in Jenkins (trunk, trunk-Java8)

      Error Message

      Pending cached list of 127.0.0.1:54134 is not empty, [

      {blockId=1073741841, replication=1, mark=true}]

      Stacktrace

      java.lang.AssertionError: Pending cached list of 127.0.0.1:54134 is not empty, [{blockId=1073741841, replication=1, mark=true}

      ]
      at org.junit.Assert.fail(Assert.java:88)
      at org.junit.Assert.assertTrue(Assert.java:41)
      at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.checkPendingCachedEmpty(TestCacheDirectives.java:1479)
      at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testExceedsCapacity(TestCacheDirectives.java:1502)

      1. HDFS-9549.04.patch
        6 kB
        Xiao Chen
      2. HDFS-9549.03.patch
        7 kB
        Xiao Chen
      3. HDFS-9549.02.patch
        7 kB
        Xiao Chen
      4. HDFS-9549.01.patch
        5 kB
        Xiao Chen
      5. TestCacheDirectives.rtf
        750 kB
        Xiao Chen

        Issue Links

          Activity

          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          From the log (https://builds.apache.org/job/Hadoop-Hdfs-trunk/2621/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestCacheDirectives/testExceedsCapacity/),
          it is weird because pending cache can only be added by CacheManager.addNewPendingCached(), but the log did not show any pending cached added.

          Show
          jojochuang Wei-Chiu Chuang added a comment - From the log ( https://builds.apache.org/job/Hadoop-Hdfs-trunk/2621/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestCacheDirectives/testExceedsCapacity/ ), it is weird because pending cache can only be added by CacheManager.addNewPendingCached(), but the log did not show any pending cached added.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot Wei-Chiu Chuang for reporting the issue. I'm working on it.

          Show
          xiaochen Xiao Chen added a comment - Thanks a lot Wei-Chiu Chuang for reporting the issue. I'm working on it.
          Hide
          xiaochen Xiao Chen added a comment -

          This one is a bit tricky to me. IIUC:

          • testExceedsCapacity waits for DNs to reach their cache capacity, then verify that there's no pendingCached left.
          • Test fails because, although CacheReplicationMonitor#addNewPendingCached checks remaining bytes before adding, it's possible that datanode can have a pending to become cached on the CachingTask thread, which is no longer pending but not counted as cacheUsed yet. Then a new block will be added as pendingCached, which eventually will not succeed since the capacity is reached. (I'll attach a log I used to analyze this, 1073741826 is the one already transitioned to cached, and 1073741841 is the one that's been added later and never succeeded. I have a waitFor when getting this log, so the end of the log is just repeating and can be ignored)
          • Given the above root cause, fixing it in CacheReplicationMonitor#addNewPendingCached would be difficult without some synchronization. Instead, I took another approach, to conditionally remove the extra blocks if a DN don't have enough resident to fit it in CacheReplicationMonitor#rescanCachedBlockMap. I understand that would make the scan slower, so I combined that with the current iteration of pendingCached, hoping to minimize the impact.

          Andrew Wang and Colin P. McCabe, could you please review? Thanks!

          Show
          xiaochen Xiao Chen added a comment - This one is a bit tricky to me. IIUC: testExceedsCapacity waits for DNs to reach their cache capacity, then verify that there's no pendingCached left. Test fails because, although CacheReplicationMonitor#addNewPendingCached checks remaining bytes before adding, it's possible that datanode can have a pending to become cached on the CachingTask thread, which is no longer pending but not counted as cacheUsed yet. Then a new block will be added as pendingCached , which eventually will not succeed since the capacity is reached. (I'll attach a log I used to analyze this, 1073741826 is the one already transitioned to cached, and 1073741841 is the one that's been added later and never succeeded. I have a waitFor when getting this log, so the end of the log is just repeating and can be ignored) Given the above root cause, fixing it in CacheReplicationMonitor#addNewPendingCached would be difficult without some synchronization. Instead, I took another approach, to conditionally remove the extra blocks if a DN don't have enough resident to fit it in CacheReplicationMonitor#rescanCachedBlockMap . I understand that would make the scan slower, so I combined that with the current iteration of pendingCached , hoping to minimize the impact. Andrew Wang and Colin P. McCabe , could you please review? Thanks!
          Hide
          xiaochen Xiao Chen added a comment -

          Below is the excerpt from the log, HTH.

          -- cached initially 1073741826 --
          2016-02-11 10:58:47,656 INFO  datanode.DataNode (BPOfferService.java:processCommandFromActive(671)) - DatanodeCommand action: DNA_CACHE for BP-92343436-localhost-1455217122700 of [1073741826, 1073741828, 1073741829, 1073741834]
          2016-02-11 10:58:47,669 DEBUG impl.FsDatasetCache (FsDatasetCache.java:run(486)) - Successfully cached 1073741826_BP-92343436-localhost-1455217122700.  We are now caching 4096 bytes in total.
          2016-02-11 10:58:47,673 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741826, replication=1, mark=false} to CACHED list.
          2016-02-11 10:58:47,674 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(988)) - Removed block {blockId=1073741826, replication=1, mark=false} from PENDING_CACHED list.
          2016-02-11 10:58:47,674 DEBUG namenode.CacheManager (CacheManager.java:processCacheReport(953)) - Processed cache report from DatanodeRegistration(127.0.0.1:54112, datanodeUuid=1f194c3a-3103-467e-a239-6258d71785a9, infoPort=54114, infoSecurePort=0, ipcPort=54115, storageInfo=lv=-56;cid=testClusterID;nsid=513491009;c=1455217122700), blocks: 1, processing time: 2 msecs
          
          -- innocent 1073741841 added --
          2016-02-11 10:58:47,758 ERROR blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:addNewPendingCached(699)) - Adding Block 1073741841: DataNode 1f194c3a-3103-467e-a239-6258d71785a9 because the block has size 4096, but the DataNode only has 4096 bytes of cache remaining (-12288 pending bytes, 16384 already cached.)
          2016-02-11 10:58:47,759 TRACE blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:addNewPendingCached(711)) - Block 1073741841: added to PENDING_CACHED on DataNode 1f194c3a-3103-467e-a239-6258d71785a9
          2016-02-11 10:58:48,661 INFO  datanode.DataNode (BPOfferService.java:processCommandFromActive(671)) - DatanodeCommand action: DNA_CACHE for BP-92343436-localhost-1455217122700 of [1073741828, 1073741829, 1073741834, 1073741841]
          2016-02-11 10:58:48,667 WARN  impl.FsDatasetCache (FsDatasetCache.java:run(441)) - Failed to cache 1073741841_BP-92343436-localhost-1455217122700: could not reserve 4096 more bytes in the cache: dfs.datanode.max.locked.memory of 16384 exceeded.
          2016-02-11 10:58:48,666 DEBUG impl.FsDatasetCache (FsDatasetCache.java:cacheBlock(309)) - Initiating caching for Block with id 1073741841, pool BP-92343436-localhost-1455217122700
          2016-02-11 10:58:48,668 DEBUG impl.FsDatasetCache (FsDatasetCache.java:run(499)) - Caching of 1073741841_BP-92343436-localhost-1455217122700 was aborted.  We are now caching only 16384 bytes in total.
          
          -- eventually, 1073741841 fails forever --
          2016-02-11 10:58:50,670 DEBUG impl.FsDatasetCache (FsDatasetCache.java:cacheBlock(309)) - Initiating caching for Block with id 1073741841, pool BP-92343436-localhost-1455217122700
          2016-02-11 10:58:50,670 WARN  impl.FsDatasetCache (FsDatasetCache.java:run(441)) - Failed to cache 1073741841_BP-92343436-localhost-1455217122700: could not reserve 4096 more bytes in the cache: dfs.datanode.max.locked.memory of 16384 exceeded.
          2016-02-11 10:58:50,672 DEBUG impl.FsDatasetCache (FsDatasetCache.java:run(499)) - Caching of 1073741841_BP-92343436-localhost-1455217122700 was aborted.  We are now caching only 16384 bytes in total.
           ....
          
          -- and cache reports show this --
          2016-02-11 10:58:49,675 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 127.0.0.1:54112 has block 1073741826
          2016-02-11 10:58:49,675 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741826, replication=1, mark=false} to CACHED list.
          2016-02-11 10:58:49,675 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 127.0.0.1:54112 has block 1073741829
          2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741829, replication=1, mark=false} to CACHED list.
          2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(988)) - Removed block {blockId=1073741829, replication=1, mark=false} from PENDING_CACHED list.
          2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 127.0.0.1:54112 has block 1073741828
          2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741828, replication=1, mark=false} to CACHED list.
          2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(988)) - Removed block {blockId=1073741828, replication=1, mark=false} from PENDING_CACHED list.
          2016-02-11 10:58:49,677 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 127.0.0.1:54112 has block 1073741834
          2016-02-11 10:58:49,677 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741834, replication=1, mark=false} to CACHED list.
          2016-02-11 10:58:49,677 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(988)) - Removed block {blockId=1073741834, replication=1, mark=false} from PENDING_CACHED list.
          
          Show
          xiaochen Xiao Chen added a comment - Below is the excerpt from the log, HTH. -- cached initially 1073741826 -- 2016-02-11 10:58:47,656 INFO datanode.DataNode (BPOfferService.java:processCommandFromActive(671)) - DatanodeCommand action: DNA_CACHE for BP-92343436-localhost-1455217122700 of [1073741826, 1073741828, 1073741829, 1073741834] 2016-02-11 10:58:47,669 DEBUG impl.FsDatasetCache (FsDatasetCache.java:run(486)) - Successfully cached 1073741826_BP-92343436-localhost-1455217122700. We are now caching 4096 bytes in total. 2016-02-11 10:58:47,673 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741826, replication=1, mark=false} to CACHED list. 2016-02-11 10:58:47,674 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(988)) - Removed block {blockId=1073741826, replication=1, mark=false} from PENDING_CACHED list. 2016-02-11 10:58:47,674 DEBUG namenode.CacheManager (CacheManager.java:processCacheReport(953)) - Processed cache report from DatanodeRegistration(127.0.0.1:54112, datanodeUuid=1f194c3a-3103-467e-a239-6258d71785a9, infoPort=54114, infoSecurePort=0, ipcPort=54115, storageInfo=lv=-56;cid=testClusterID;nsid=513491009;c=1455217122700), blocks: 1, processing time: 2 msecs -- innocent 1073741841 added -- 2016-02-11 10:58:47,758 ERROR blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:addNewPendingCached(699)) - Adding Block 1073741841: DataNode 1f194c3a-3103-467e-a239-6258d71785a9 because the block has size 4096, but the DataNode only has 4096 bytes of cache remaining (-12288 pending bytes, 16384 already cached.) 2016-02-11 10:58:47,759 TRACE blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:addNewPendingCached(711)) - Block 1073741841: added to PENDING_CACHED on DataNode 1f194c3a-3103-467e-a239-6258d71785a9 2016-02-11 10:58:48,661 INFO datanode.DataNode (BPOfferService.java:processCommandFromActive(671)) - DatanodeCommand action: DNA_CACHE for BP-92343436-localhost-1455217122700 of [1073741828, 1073741829, 1073741834, 1073741841] 2016-02-11 10:58:48,667 WARN impl.FsDatasetCache (FsDatasetCache.java:run(441)) - Failed to cache 1073741841_BP-92343436-localhost-1455217122700: could not reserve 4096 more bytes in the cache: dfs.datanode.max.locked.memory of 16384 exceeded. 2016-02-11 10:58:48,666 DEBUG impl.FsDatasetCache (FsDatasetCache.java:cacheBlock(309)) - Initiating caching for Block with id 1073741841, pool BP-92343436-localhost-1455217122700 2016-02-11 10:58:48,668 DEBUG impl.FsDatasetCache (FsDatasetCache.java:run(499)) - Caching of 1073741841_BP-92343436-localhost-1455217122700 was aborted. We are now caching only 16384 bytes in total. -- eventually, 1073741841 fails forever -- 2016-02-11 10:58:50,670 DEBUG impl.FsDatasetCache (FsDatasetCache.java:cacheBlock(309)) - Initiating caching for Block with id 1073741841, pool BP-92343436-localhost-1455217122700 2016-02-11 10:58:50,670 WARN impl.FsDatasetCache (FsDatasetCache.java:run(441)) - Failed to cache 1073741841_BP-92343436-localhost-1455217122700: could not reserve 4096 more bytes in the cache: dfs.datanode.max.locked.memory of 16384 exceeded. 2016-02-11 10:58:50,672 DEBUG impl.FsDatasetCache (FsDatasetCache.java:run(499)) - Caching of 1073741841_BP-92343436-localhost-1455217122700 was aborted. We are now caching only 16384 bytes in total. .... -- and cache reports show this -- 2016-02-11 10:58:49,675 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 127.0.0.1:54112 has block 1073741826 2016-02-11 10:58:49,675 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741826, replication=1, mark=false} to CACHED list. 2016-02-11 10:58:49,675 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 127.0.0.1:54112 has block 1073741829 2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741829, replication=1, mark=false} to CACHED list. 2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(988)) - Removed block {blockId=1073741829, replication=1, mark=false} from PENDING_CACHED list. 2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 127.0.0.1:54112 has block 1073741828 2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741828, replication=1, mark=false} to CACHED list. 2016-02-11 10:58:49,676 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(988)) - Removed block {blockId=1073741828, replication=1, mark=false} from PENDING_CACHED list. 2016-02-11 10:58:49,677 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(966)) - Cache report from datanode 127.0.0.1:54112 has block 1073741834 2016-02-11 10:58:49,677 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(984)) - Added block {blockId=1073741834, replication=1, mark=false} to CACHED list. 2016-02-11 10:58:49,677 TRACE namenode.CacheManager (CacheManager.java:processCacheReportImpl(988)) - Removed block {blockId=1073741834, replication=1, mark=false} from PENDING_CACHED list.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 12s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 11s trunk passed
          +1 compile 0m 46s trunk passed with JDK v1.8.0_72
          +1 compile 0m 48s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 57s trunk passed
          +1 mvneclipse 0m 16s trunk passed
          +1 findbugs 2m 8s trunk passed
          +1 javadoc 1m 10s trunk passed with JDK v1.8.0_72
          +1 javadoc 1m 53s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 49s the patch passed
          +1 compile 0m 42s the patch passed with JDK v1.8.0_72
          +1 javac 0m 42s the patch passed
          +1 compile 0m 40s the patch passed with JDK v1.7.0_95
          +1 javac 0m 40s the patch passed
          -1 checkstyle 0m 18s hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 10 unchanged - 1 fixed = 11 total (was 11)
          +1 mvnsite 0m 55s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 18s the patch passed
          +1 javadoc 1m 9s the patch passed with JDK v1.8.0_72
          +1 javadoc 1m 49s the patch passed with JDK v1.7.0_95
          -1 unit 55m 44s hadoop-hdfs in the patch failed with JDK v1.8.0_72.
          -1 unit 50m 45s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          132m 28s



          Reason Tests
          JDK v1.8.0_72 Failed junit tests hadoop.fs.TestHdfsNativeCodeLoader
            hadoop.hdfs.server.datanode.TestBlockScanner
          JDK v1.7.0_95 Failed junit tests hadoop.fs.TestHdfsNativeCodeLoader



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12787596/HDFS-9549.01.patch
          JIRA Issue HDFS-9549
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 2595a0f3039d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8fdef0b
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14471/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14471/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 11s trunk passed +1 compile 0m 46s trunk passed with JDK v1.8.0_72 +1 compile 0m 48s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 57s trunk passed +1 mvneclipse 0m 16s trunk passed +1 findbugs 2m 8s trunk passed +1 javadoc 1m 10s trunk passed with JDK v1.8.0_72 +1 javadoc 1m 53s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 49s the patch passed +1 compile 0m 42s the patch passed with JDK v1.8.0_72 +1 javac 0m 42s the patch passed +1 compile 0m 40s the patch passed with JDK v1.7.0_95 +1 javac 0m 40s the patch passed -1 checkstyle 0m 18s hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 10 unchanged - 1 fixed = 11 total (was 11) +1 mvnsite 0m 55s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 18s the patch passed +1 javadoc 1m 9s the patch passed with JDK v1.8.0_72 +1 javadoc 1m 49s the patch passed with JDK v1.7.0_95 -1 unit 55m 44s hadoop-hdfs in the patch failed with JDK v1.8.0_72. -1 unit 50m 45s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 132m 28s Reason Tests JDK v1.8.0_72 Failed junit tests hadoop.fs.TestHdfsNativeCodeLoader   hadoop.hdfs.server.datanode.TestBlockScanner JDK v1.7.0_95 Failed junit tests hadoop.fs.TestHdfsNativeCodeLoader Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12787596/HDFS-9549.01.patch JIRA Issue HDFS-9549 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2595a0f3039d 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8fdef0b Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14471/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14471/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14471/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          This is a good find, Xiao Chen. So, the root of the issue here is that we can have a block which is "pending cached" forever on a DataNode that we already know does not have space to cache any more blocks. The patch you posted adds a loop for every block that is pending to be cached on any datanodes, to check for this condition. Rather than doing that, it would be simpler and more efficient to just loop over all datanodes and make sure that pendingCached only contained blocks that we could realistically hope to cache, given the current values for DatanodeInfo#cacheCapacity and DatanodeInfo#cacheUsed.

          Show
          cmccabe Colin P. McCabe added a comment - This is a good find, Xiao Chen . So, the root of the issue here is that we can have a block which is "pending cached" forever on a DataNode that we already know does not have space to cache any more blocks. The patch you posted adds a loop for every block that is pending to be cached on any datanodes, to check for this condition. Rather than doing that, it would be simpler and more efficient to just loop over all datanodes and make sure that pendingCached only contained blocks that we could realistically hope to cache, given the current values for DatanodeInfo#cacheCapacity and DatanodeInfo#cacheUsed .
          Hide
          xiaochen Xiao Chen added a comment -

          Thank you for the comment, and nicely summarizing the root cause Colin P. McCabe! You definitely summarized it better than I did.

          Rather than doing that, it would be simpler and more efficient to just loop over all datanodes and make sure that pendingCached only contained blocks that we could realistically hope to cache

          Could you further explain this? IIUC you're proposing to add the remove logic into DN's thread, instead of the CacheReplicationMonitor#rescanCachedBlockMap? I think there're 2 things we want to remove - the block from DN's pendingCached block, and the DN from cachedBlocks's pendingCached list in the cache manager.
          In the remove code, that's

                    datanode.getPendingCached().remove(cblock);    // remove from the DN
                    iter.remove();      // remove the DN from the list of pendingCached DNs of that block from the cache manager.
          

          I didn't find how to remove the latter in a DN context. Please advice. Thanks again!

          Show
          xiaochen Xiao Chen added a comment - Thank you for the comment, and nicely summarizing the root cause Colin P. McCabe ! You definitely summarized it better than I did. Rather than doing that, it would be simpler and more efficient to just loop over all datanodes and make sure that pendingCached only contained blocks that we could realistically hope to cache Could you further explain this? IIUC you're proposing to add the remove logic into DN's thread, instead of the CacheReplicationMonitor#rescanCachedBlockMap ? I think there're 2 things we want to remove - the block from DN's pendingCached block, and the DN from cachedBlocks 's pendingCached list in the cache manager. In the remove code, that's datanode.getPendingCached().remove(cblock); // remove from the DN iter.remove(); // remove the DN from the list of pendingCached DNs of that block from the cache manager. I didn't find how to remove the latter in a DN context. Please advice. Thanks again!
          Hide
          xiaochen Xiao Chen added a comment -

          I have talked with Colin offline, and my comment above misinterpreted his intent. He just meant to have the similar fix in a more optimized way. Sorry I misunderstood earlier.

          Patch 2 attached tries to remove the pendingCached blocks by first going through DNs, and ignoring the DNs that's not reached capacity watermark.
          The watermark is hardcoded - IMHO configuration would be an overkill.

          One thing I think worth mentioning is that, due to the nature of the race, it is also possible that a block is in fact CACHED, but not yet removed from PENDING_CACHED. If the DN is beyond watermark, we may remove that early due to the added logic. I don't think we need special handling on that, since the state is still correct, just the removal happens in CRM instead of cache reporting.

          Show
          xiaochen Xiao Chen added a comment - I have talked with Colin offline, and my comment above misinterpreted his intent. He just meant to have the similar fix in a more optimized way. Sorry I misunderstood earlier. Patch 2 attached tries to remove the pendingCached blocks by first going through DNs, and ignoring the DNs that's not reached capacity watermark. The watermark is hardcoded - IMHO configuration would be an overkill. One thing I think worth mentioning is that, due to the nature of the race, it is also possible that a block is in fact CACHED , but not yet removed from PENDING_CACHED . If the DN is beyond watermark, we may remove that early due to the added logic. I don't think we need special handling on that, since the state is still correct, just the removal happens in CRM instead of cache reporting.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 9s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 14s trunk passed
          +1 compile 0m 44s trunk passed with JDK v1.8.0_72
          +1 compile 0m 45s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 55s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 2m 2s trunk passed
          +1 javadoc 1m 12s trunk passed with JDK v1.8.0_72
          +1 javadoc 1m 54s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 40s the patch passed with JDK v1.8.0_72
          +1 javac 0m 40s the patch passed
          +1 compile 0m 42s the patch passed with JDK v1.7.0_95
          +1 javac 0m 42s the patch passed
          -1 checkstyle 0m 20s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 59 unchanged - 0 fixed = 61 total (was 59)
          +1 mvnsite 0m 52s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 12s the patch passed
          +1 javadoc 1m 8s the patch passed with JDK v1.8.0_72
          +1 javadoc 1m 55s the patch passed with JDK v1.7.0_95
          -1 unit 60m 3s hadoop-hdfs in the patch failed with JDK v1.8.0_72.
          -1 unit 54m 44s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 21s Patch does not generate ASF License warnings.
          141m 41s



          Reason Tests
          JDK v1.8.0_72 Failed junit tests hadoop.hdfs.server.namenode.ha.TestHAAppend
          JDK v1.7.0_95 Failed junit tests hadoop.hdfs.qjournal.client.TestQuorumJournalManager



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788544/HDFS-9549.02.patch
          JIRA Issue HDFS-9549
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 74aa5b7ae0e6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8ab7658
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14536/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14536/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 9s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 14s trunk passed +1 compile 0m 44s trunk passed with JDK v1.8.0_72 +1 compile 0m 45s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 55s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 2m 2s trunk passed +1 javadoc 1m 12s trunk passed with JDK v1.8.0_72 +1 javadoc 1m 54s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 48s the patch passed +1 compile 0m 40s the patch passed with JDK v1.8.0_72 +1 javac 0m 40s the patch passed +1 compile 0m 42s the patch passed with JDK v1.7.0_95 +1 javac 0m 42s the patch passed -1 checkstyle 0m 20s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 59 unchanged - 0 fixed = 61 total (was 59) +1 mvnsite 0m 52s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 12s the patch passed +1 javadoc 1m 8s the patch passed with JDK v1.8.0_72 +1 javadoc 1m 55s the patch passed with JDK v1.7.0_95 -1 unit 60m 3s hadoop-hdfs in the patch failed with JDK v1.8.0_72. -1 unit 54m 44s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 21s Patch does not generate ASF License warnings. 141m 41s Reason Tests JDK v1.8.0_72 Failed junit tests hadoop.hdfs.server.namenode.ha.TestHAAppend JDK v1.7.0_95 Failed junit tests hadoop.hdfs.qjournal.client.TestQuorumJournalManager Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788544/HDFS-9549.02.patch JIRA Issue HDFS-9549 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 74aa5b7ae0e6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8ab7658 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14536/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14536/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14536/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 3 fixes checkstyle warnings.
          Colin P. McCabe, could you take another look? Thanks a lot!

          Show
          xiaochen Xiao Chen added a comment - Patch 3 fixes checkstyle warnings. Colin P. McCabe , could you take another look? Thanks a lot!
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 17s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 7m 27s trunk passed
          +1 compile 0m 52s trunk passed with JDK v1.8.0_72
          +1 compile 0m 46s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 56s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 59s trunk passed
          +1 javadoc 1m 16s trunk passed with JDK v1.8.0_72
          +1 javadoc 2m 0s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 50s the patch passed
          +1 compile 0m 52s the patch passed with JDK v1.8.0_72
          +1 javac 0m 52s the patch passed
          +1 compile 0m 43s the patch passed with JDK v1.7.0_95
          +1 javac 0m 43s the patch passed
          +1 checkstyle 0m 19s the patch passed
          +1 mvnsite 0m 55s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 16s the patch passed
          +1 javadoc 1m 18s the patch passed with JDK v1.8.0_72
          +1 javadoc 1m 57s the patch passed with JDK v1.7.0_95
          -1 unit 80m 42s hadoop-hdfs in the patch failed with JDK v1.8.0_72.
          -1 unit 102m 22s hadoop-hdfs in the patch failed with JDK v1.7.0_95.
          +1 asflicense 0m 35s Patch does not generate ASF License warnings.
          211m 31s



          Reason Tests
          JDK v1.8.0_72 Failed junit tests hadoop.hdfs.server.namenode.ha.TestHAMetrics
            hadoop.hdfs.server.blockmanagement.TestBlockManager
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
            hadoop.hdfs.TestFileAppend
            hadoop.hdfs.TestRollingUpgrade
            hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA
            hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery
          JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.TestPersistBlocks
            hadoop.hdfs.TestFileAppend
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.server.datanode.TestBlockReplacement
            hadoop.hdfs.server.datanode.TestDirectoryScanner
          JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788582/HDFS-9549.03.patch
          JIRA Issue HDFS-9549
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux aa759661926e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 8ab7658
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14539/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14539/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14539/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14539/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14539/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14539/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 7m 27s trunk passed +1 compile 0m 52s trunk passed with JDK v1.8.0_72 +1 compile 0m 46s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 56s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 59s trunk passed +1 javadoc 1m 16s trunk passed with JDK v1.8.0_72 +1 javadoc 2m 0s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 50s the patch passed +1 compile 0m 52s the patch passed with JDK v1.8.0_72 +1 javac 0m 52s the patch passed +1 compile 0m 43s the patch passed with JDK v1.7.0_95 +1 javac 0m 43s the patch passed +1 checkstyle 0m 19s the patch passed +1 mvnsite 0m 55s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 16s the patch passed +1 javadoc 1m 18s the patch passed with JDK v1.8.0_72 +1 javadoc 1m 57s the patch passed with JDK v1.7.0_95 -1 unit 80m 42s hadoop-hdfs in the patch failed with JDK v1.8.0_72. -1 unit 102m 22s hadoop-hdfs in the patch failed with JDK v1.7.0_95. +1 asflicense 0m 35s Patch does not generate ASF License warnings. 211m 31s Reason Tests JDK v1.8.0_72 Failed junit tests hadoop.hdfs.server.namenode.ha.TestHAMetrics   hadoop.hdfs.server.blockmanagement.TestBlockManager   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.TestFileAppend   hadoop.hdfs.TestRollingUpgrade   hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA   hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery JDK v1.7.0_95 Failed junit tests hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.TestPersistBlocks   hadoop.hdfs.TestFileAppend   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.hdfs.server.datanode.TestDirectoryScanner JDK v1.7.0_95 Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2 Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12788582/HDFS-9549.03.patch JIRA Issue HDFS-9549 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux aa759661926e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 8ab7658 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14539/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14539/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14539/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14539/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_95.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14539/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14539/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          cmccabe Colin P. McCabe added a comment -

          Thanks, Xiao Chen.

                if (dn.getCacheRemainingPercent() > watermark
                    || dn.getPendingCached().isEmpty()) {
                  continue;
                }
          

          I can see why you want to do this, but it isn't quite correct. We could have 10% of our DN's cache remaining, but a really big uncacheable block in its pending_cached list. I would say just start with the current cacheUsed that the DN has reported in its latest heartbeat and keep adding to it as you add pendingCached blocks. Drop any pendingCached blocks which would cause it to exceed cacheCapacity. Perhaps we can optimize this more later (like by keeping a running total of pending cached), but for now that should make it correct.

          Show
          cmccabe Colin P. McCabe added a comment - Thanks, Xiao Chen . if (dn.getCacheRemainingPercent() > watermark || dn.getPendingCached().isEmpty()) { continue ; } I can see why you want to do this, but it isn't quite correct. We could have 10% of our DN's cache remaining, but a really big uncacheable block in its pending_cached list. I would say just start with the current cacheUsed that the DN has reported in its latest heartbeat and keep adding to it as you add pendingCached blocks. Drop any pendingCached blocks which would cause it to exceed cacheCapacity. Perhaps we can optimize this more later (like by keeping a running total of pending cached), but for now that should make it correct.
          Hide
          xiaochen Xiao Chen added a comment -

          Thank you Colin P. McCabe.
          You're right, I should not have made such assumption. Patch 4 attached to just calculate the remaining. I agree this should be sufficient for now, since we're only looping through the DNs.

          Show
          xiaochen Xiao Chen added a comment - Thank you Colin P. McCabe . You're right, I should not have made such assumption. Patch 4 attached to just calculate the remaining. I agree this should be sufficient for now, since we're only looping through the DNs.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 4 seems never triggered jenkins. Reattaching.

          Show
          xiaochen Xiao Chen added a comment - Patch 4 seems never triggered jenkins. Reattaching.
          Hide
          xiaochen Xiao Chen added a comment -

          hm, the latest run failed with seemingly env issues:

          Build step 'Execute shell' marked build as failure
          ERROR: Publisher 'Archive the artifacts' failed: no workspace for PreCommit-HDFS-Build #14557
          ERROR: H3 is offline; cannot locate jdk-1.8.0
          [description-setter] Description set: HDFS-9549
          ERROR: Publisher 'Publish JUnit test result report' failed: no workspace for PreCommit-HDFS-Build #14557
          Finished: FAILURE
          

          https://builds.apache.org/job/PreCommit-HDFS-Build/14557/
          Reattaching the same patch 4...

          Show
          xiaochen Xiao Chen added a comment - hm, the latest run failed with seemingly env issues: Build step 'Execute shell' marked build as failure ERROR: Publisher 'Archive the artifacts' failed: no workspace for PreCommit-HDFS-Build #14557 ERROR: H3 is offline; cannot locate jdk-1.8.0 [description-setter] Description set: HDFS-9549 ERROR: Publisher 'Publish JUnit test result report' failed: no workspace for PreCommit-HDFS-Build #14557 Finished: FAILURE https://builds.apache.org/job/PreCommit-HDFS-Build/14557/ Reattaching the same patch 4...
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 6m 36s trunk passed
          +1 compile 0m 39s trunk passed with JDK v1.8.0_72
          +1 compile 0m 40s trunk passed with JDK v1.7.0_95
          +1 checkstyle 0m 20s trunk passed
          +1 mvnsite 0m 50s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 1m 6s trunk passed with JDK v1.8.0_72
          +1 javadoc 1m 45s trunk passed with JDK v1.7.0_95
          +1 mvninstall 0m 45s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.8.0_72
          +1 javac 0m 38s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.7.0_95
          +1 javac 0m 38s the patch passed
          +1 checkstyle 0m 19s the patch passed
          +1 mvnsite 0m 48s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 5s the patch passed
          +1 javadoc 1m 0s the patch passed with JDK v1.8.0_72
          +1 javadoc 1m 47s the patch passed with JDK v1.7.0_95
          -1 unit 54m 1s hadoop-hdfs in the patch failed with JDK v1.8.0_72.
          +1 unit 51m 26s hadoop-hdfs in the patch passed with JDK v1.7.0_95.
          +1 asflicense 0m 22s Patch does not generate ASF License warnings.
          130m 14s



          Reason Tests
          JDK v1.8.0_72 Failed junit tests hadoop.hdfs.TestDatanodeRegistration



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789028/HDFS-9549.04.patch
          JIRA Issue HDFS-9549
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux c8b3abd20e66 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 3fab885
          Default Java 1.7.0_95
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14562/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14562/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt
          JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14562/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14562/console
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 36s trunk passed +1 compile 0m 39s trunk passed with JDK v1.8.0_72 +1 compile 0m 40s trunk passed with JDK v1.7.0_95 +1 checkstyle 0m 20s trunk passed +1 mvnsite 0m 50s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 6s trunk passed with JDK v1.8.0_72 +1 javadoc 1m 45s trunk passed with JDK v1.7.0_95 +1 mvninstall 0m 45s the patch passed +1 compile 0m 38s the patch passed with JDK v1.8.0_72 +1 javac 0m 38s the patch passed +1 compile 0m 38s the patch passed with JDK v1.7.0_95 +1 javac 0m 38s the patch passed +1 checkstyle 0m 19s the patch passed +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 5s the patch passed +1 javadoc 1m 0s the patch passed with JDK v1.8.0_72 +1 javadoc 1m 47s the patch passed with JDK v1.7.0_95 -1 unit 54m 1s hadoop-hdfs in the patch failed with JDK v1.8.0_72. +1 unit 51m 26s hadoop-hdfs in the patch passed with JDK v1.7.0_95. +1 asflicense 0m 22s Patch does not generate ASF License warnings. 130m 14s Reason Tests JDK v1.8.0_72 Failed junit tests hadoop.hdfs.TestDatanodeRegistration Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12789028/HDFS-9549.04.patch JIRA Issue HDFS-9549 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c8b3abd20e66 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3fab885 Default Java 1.7.0_95 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_72 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14562/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14562/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_72.txt JDK v1.7.0_95 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14562/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14562/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Last jenkins succeeded, failed test looks unrelated.
          Colin P. McCabe, may I ask you for another review? Thank you.

          Show
          xiaochen Xiao Chen added a comment - Last jenkins succeeded, failed test looks unrelated. Colin P. McCabe , may I ask you for another review? Thank you.
          Hide
          cmccabe Colin P. McCabe added a comment -

          +1. Thanks, Xiao Chen.

          Show
          cmccabe Colin P. McCabe added a comment - +1. Thanks, Xiao Chen .
          Hide
          cmccabe Colin P. McCabe added a comment -

          Committed to 2.8, thanks!

          Show
          cmccabe Colin P. McCabe added a comment - Committed to 2.8, thanks!
          Hide
          xiaochen Xiao Chen added a comment -

          Thank you so much Colin P. McCabe for the patient reviews / offline talks and the commit!

          Show
          xiaochen Xiao Chen added a comment - Thank you so much Colin P. McCabe for the patient reviews / offline talks and the commit!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9352 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9352/)
          HDFS-9549. TestCacheDirectives#testExceedsCapacity is flaky (Xiao Chen (cmccabe: rev 211c78c09073e5b34db309b49d8de939a7a812f5)

          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9352 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9352/ ) HDFS-9549 . TestCacheDirectives#testExceedsCapacity is flaky (Xiao Chen (cmccabe: rev 211c78c09073e5b34db309b49d8de939a7a812f5) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

            People

            • Assignee:
              xiaochen Xiao Chen
              Reporter:
              jojochuang Wei-Chiu Chuang
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development