Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9701

DN may deadlock when hot-swapping under load

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      If the DN is under load (new blocks being written), a hot-swap task by hdfs dfsadmin -reconfig may cause a dead lock.

      1. HDFS-9701.01.patch
        17 kB
        Xiao Chen
      2. HDFS-9701.02.patch
        20 kB
        Xiao Chen
      3. HDFS-9701.03.patch
        15 kB
        Xiao Chen
      4. HDFS-9701.04.patch
        15 kB
        Xiao Chen
      5. HDFS-9701.05.patch
        15 kB
        Xiao Chen

        Issue Links

          Activity

          Hide
          xiaochen Xiao Chen added a comment -

          Most notable jstacks:
          Reconfigure task:

          "Reconfiguration Task" #459 daemon prio=5 os_prio=0 tid=0x00007fc6913a6000 nid=0x5219 waiting on condition [0x00007fc663cde000]
             java.lang.Thread.State: TIMED_WAITING (sleeping)
          	at java.lang.Thread.sleep(Native Method)
          	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.closeAndWait(FsVolumeImpl.java:251)
          	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:322)
          	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:363)
          	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:472)
          	- locked <0x00000000d6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:718)
          	- locked <0x00000000d55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:684)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode.refreshVolumes(DataNode.java:648)
          	- locked <0x00000000d55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode)
          	at org.apache.hadoop.hdfs.server.datanode.DataNode.reconfigurePropertyImpl(DataNode.java:485)
          	at org.apache.hadoop.conf.ReconfigurableBase$ReconfigurationThread.run(ReconfigurableBase.java:133)
          

          Being written thread:

          "PacketResponder: BP-284727513-10.64.40.36-1450767058747:blk_1073785044_44298, type=HAS_DOWNSTREAM_IN_PIPELINE" #462 daemon prio=5 os_prio=0 tid=0x00007fc67c5c8000 nid=0x5268 waiting for monitor entry [0x00007fc662ed2000]
             java.lang.Thread.State: BLOCKED (on object monitor)
          	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1487)
          	- waiting to lock <0x00000000d6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
          	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1300)
          	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1257)
          	at java.lang.Thread.run(Thread.java:745)
          

          The deadlock happens between a lock and a reference count waiting:

          1. in BlockReceiver$PacketResponder#finalizeBlock, reference is increased after claimReplicaHandler. (Code here)
          2. Reconfigure task locks on the FsDatasetImpl object
          3. Reconfigure task calls all the way into FsVolumeImpl#closeAndWait, infinite loop waiting on reference count (Code here)
          4. BlockReceiver$PacketResponder#finalizeBlock waits on the FsDatasetImpl object's lock in step #2. Oops.
          Show
          xiaochen Xiao Chen added a comment - Most notable jstacks: Reconfigure task: "Reconfiguration Task" #459 daemon prio=5 os_prio=0 tid=0x00007fc6913a6000 nid=0x5219 waiting on condition [0x00007fc663cde000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.closeAndWait(FsVolumeImpl.java:251) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:322) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:363) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:472) - locked <0x00000000d6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:718) - locked <0x00000000d55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode) at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:684) at org.apache.hadoop.hdfs.server.datanode.DataNode.refreshVolumes(DataNode.java:648) - locked <0x00000000d55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode) at org.apache.hadoop.hdfs.server.datanode.DataNode.reconfigurePropertyImpl(DataNode.java:485) at org.apache.hadoop.conf.ReconfigurableBase$ReconfigurationThread.run(ReconfigurableBase.java:133) Being written thread: "PacketResponder: BP-284727513-10.64.40.36-1450767058747:blk_1073785044_44298, type=HAS_DOWNSTREAM_IN_PIPELINE" #462 daemon prio=5 os_prio=0 tid=0x00007fc67c5c8000 nid=0x5268 waiting for monitor entry [0x00007fc662ed2000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1487) - waiting to lock <0x00000000d6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1300) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1257) at java.lang.Thread.run(Thread.java:745) The deadlock happens between a lock and a reference count waiting: in BlockReceiver$PacketResponder#finalizeBlock , reference is increased after claimReplicaHandler . (Code here ) Reconfigure task locks on the FsDatasetImpl object Reconfigure task calls all the way into FsVolumeImpl#closeAndWait , infinite loop waiting on reference count (Code here ) BlockReceiver$PacketResponder#finalizeBlock waits on the FsDatasetImpl object's lock in step #2. Oops.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks a lot to Lei (Eddy) Xu for the offline discussion about the general ideas!

          To fix the problem, we have several ways:
          1. Don't do infinite wait in FsVolumeImpl#closeAndWait, wait outside of the lock scope
          2. Use finer-grained locks on the volumes in FsDatasetImpl.

          I think option 1 is better since the change is smaller, and the infinite wait inside seems a bit scary to me.
          Patch 1 attempts to solve the problem along option 1.

          • Moved the wait-for-close logic to outside of the FsDatasetImpl, into DataNode.
          • Had to add a new interface to FsDatasetSpi
          • Added methods along the call stack to allow the above
          • Added a new unit test in TestFsDatasetImpl that locks before the patch and passes after
          • Had to modify the TestFsVolumeList to accommodate the change
          • Added more info into the log in BlockReceiver which I found useful when root causing the problem.
          • Added a missing @Override in FsDatasetImpl
          Show
          xiaochen Xiao Chen added a comment - Thanks a lot to Lei (Eddy) Xu for the offline discussion about the general ideas! To fix the problem, we have several ways: 1. Don't do infinite wait in FsVolumeImpl#closeAndWait , wait outside of the lock scope 2. Use finer-grained locks on the volumes in FsDatasetImpl. I think option 1 is better since the change is smaller, and the infinite wait inside seems a bit scary to me. Patch 1 attempts to solve the problem along option 1. Moved the wait-for-close logic to outside of the FsDatasetImpl , into DataNode . Had to add a new interface to FsDatasetSpi Added methods along the call stack to allow the above Added a new unit test in TestFsDatasetImpl that locks before the patch and passes after Had to modify the TestFsVolumeList to accommodate the change Added more info into the log in BlockReceiver which I found useful when root causing the problem. Added a missing @Override in FsDatasetImpl
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
          +1 mvninstall 10m 49s trunk passed
          +1 compile 1m 9s trunk passed with JDK v1.8.0_66
          +1 compile 0m 56s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 28s trunk passed
          +1 mvnsite 1m 13s trunk passed
          +1 mvneclipse 0m 19s trunk passed
          +1 findbugs 2m 34s trunk passed
          +1 javadoc 1m 40s trunk passed with JDK v1.8.0_66
          +1 javadoc 2m 39s trunk passed with JDK v1.7.0_91
          +1 mvninstall 1m 4s the patch passed
          +1 compile 1m 5s the patch passed with JDK v1.8.0_66
          +1 javac 1m 5s the patch passed
          +1 compile 0m 55s the patch passed with JDK v1.7.0_91
          +1 javac 0m 55s the patch passed
          -1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 335 unchanged - 1 fixed = 337 total (was 336)
          +1 mvnsite 1m 8s the patch passed
          +1 mvneclipse 0m 16s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          -1 findbugs 2m 49s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
          +1 javadoc 1m 38s the patch passed with JDK v1.8.0_66
          +1 javadoc 2m 33s the patch passed with JDK v1.7.0_91
          -1 unit 85m 18s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 76m 53s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 29s Patch does not generate ASF License warnings.
          199m 57s



          Reason Tests
          FindBugs module:hadoop-hdfs-project/hadoop-hdfs
            org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(Set, boolean) calls Thread.sleep() with a lock held At DataNode.java:a lock held At DataNode.java:[line 805]
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.qjournal.client.TestQuorumJournalManager
            hadoop.hdfs.TestDFSClientRetries
            hadoop.hdfs.TestRecoverStripedFile
            hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
            hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.datanode.TestDirectoryScanner
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
            hadoop.hdfs.server.blockmanagement.TestBlockManager
            hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.datanode.TestDirectoryScanner



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784254/HDFS-9701.01.patch
          JIRA Issue HDFS-9701
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux edd0a2a534e4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d62b4a4
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14238/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14238/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 10m 49s trunk passed +1 compile 1m 9s trunk passed with JDK v1.8.0_66 +1 compile 0m 56s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 28s trunk passed +1 mvnsite 1m 13s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 2m 34s trunk passed +1 javadoc 1m 40s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 39s trunk passed with JDK v1.7.0_91 +1 mvninstall 1m 4s the patch passed +1 compile 1m 5s the patch passed with JDK v1.8.0_66 +1 javac 1m 5s the patch passed +1 compile 0m 55s the patch passed with JDK v1.7.0_91 +1 javac 0m 55s the patch passed -1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 335 unchanged - 1 fixed = 337 total (was 336) +1 mvnsite 1m 8s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. -1 findbugs 2m 49s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 1m 38s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 33s the patch passed with JDK v1.7.0_91 -1 unit 85m 18s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 76m 53s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 29s Patch does not generate ASF License warnings. 199m 57s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(Set, boolean) calls Thread.sleep() with a lock held At DataNode.java:a lock held At DataNode.java: [line 805] JDK v1.8.0_66 Failed junit tests hadoop.hdfs.qjournal.client.TestQuorumJournalManager   hadoop.hdfs.TestDFSClientRetries   hadoop.hdfs.TestRecoverStripedFile   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.hdfs.security.TestDelegationTokenForProxyUser JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.server.blockmanagement.TestBlockManager   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.datanode.TestDirectoryScanner Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784254/HDFS-9701.01.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux edd0a2a534e4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d62b4a4 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14238/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14238/console This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Patch 2 fixes checkstyle, findbugs issues, and added some javadocs.
          For failed tests, TestFsDatasetImpl and TestDataNodeHotSwapVolumes are related, others seems not.

          • TestFsDatasetImpl: original test missing cleanup. Added.
          • TestDataNodeHotSwapVolumes: IIUC, we should hflush first in order for the BlockReceiver to hold a ref count. Then we can verify that block reference is not removed because block not finalized, even if a reconfig task is launched. For this reason, I moved the barrier to fix the test. Lei (Eddy) Xu please correct me if I'm wrong.
          Show
          xiaochen Xiao Chen added a comment - Patch 2 fixes checkstyle, findbugs issues, and added some javadocs. For failed tests, TestFsDatasetImpl and TestDataNodeHotSwapVolumes are related, others seems not. TestFsDatasetImpl : original test missing cleanup. Added. TestDataNodeHotSwapVolumes : IIUC, we should hflush first in order for the BlockReceiver to hold a ref count. Then we can verify that block reference is not removed because block not finalized, even if a reconfig task is launched. For this reason, I moved the barrier to fix the test. Lei (Eddy) Xu please correct me if I'm wrong.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
          +1 mvninstall 10m 55s trunk passed
          +1 compile 1m 26s trunk passed with JDK v1.8.0_66
          +1 compile 1m 3s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 33s trunk passed
          +1 mvnsite 1m 17s trunk passed
          +1 mvneclipse 0m 18s trunk passed
          +1 findbugs 2m 37s trunk passed
          +1 javadoc 1m 45s trunk passed with JDK v1.8.0_66
          +1 javadoc 2m 44s trunk passed with JDK v1.7.0_91
          +1 mvninstall 1m 10s the patch passed
          +1 compile 1m 21s the patch passed with JDK v1.8.0_66
          +1 javac 1m 21s the patch passed
          +1 compile 0m 59s the patch passed with JDK v1.7.0_91
          +1 javac 0m 59s the patch passed
          +1 checkstyle 0m 31s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 335 unchanged - 1 fixed = 335 total (was 336)
          +1 mvnsite 1m 13s the patch passed
          +1 mvneclipse 0m 16s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 52s the patch passed
          +1 javadoc 1m 46s the patch passed with JDK v1.8.0_66
          +1 javadoc 2m 40s the patch passed with JDK v1.7.0_91
          -1 unit 76m 34s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 65m 8s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          -1 asflicense 0m 19s Patch generated 1 ASF License warnings.
          180m 57s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.qjournal.TestSecureNNWithQJM
            hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
            hadoop.hdfs.TestFileAppend
            hadoop.hdfs.TestDFSUpgradeFromImage
            hadoop.hdfs.server.datanode.TestBlockScanner
          JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150
            org.apache.hadoop.hdfs.TestWriteReadStripedFile
            org.apache.hadoop.fs.TestWebHdfsFileContextMainOperations
            org.apache.hadoop.fs.permission.TestStickyBit
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.balancer.TestBalancer
            hadoop.hdfs.server.blockmanagement.TestPendingReplication
            hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
            hadoop.hdfs.TestRecoverStripedFile
            hadoop.hdfs.server.namenode.TestNamenodeCapacityReport



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784365/HDFS-9701.02.patch
          JIRA Issue HDFS-9701
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux a702216ae3f5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 45c763a
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14243/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14243/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. +1 mvninstall 10m 55s trunk passed +1 compile 1m 26s trunk passed with JDK v1.8.0_66 +1 compile 1m 3s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 33s trunk passed +1 mvnsite 1m 17s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 2m 37s trunk passed +1 javadoc 1m 45s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 44s trunk passed with JDK v1.7.0_91 +1 mvninstall 1m 10s the patch passed +1 compile 1m 21s the patch passed with JDK v1.8.0_66 +1 javac 1m 21s the patch passed +1 compile 0m 59s the patch passed with JDK v1.7.0_91 +1 javac 0m 59s the patch passed +1 checkstyle 0m 31s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 335 unchanged - 1 fixed = 335 total (was 336) +1 mvnsite 1m 13s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 52s the patch passed +1 javadoc 1m 46s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 40s the patch passed with JDK v1.7.0_91 -1 unit 76m 34s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 65m 8s hadoop-hdfs in the patch failed with JDK v1.7.0_91. -1 asflicense 0m 19s Patch generated 1 ASF License warnings. 180m 57s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.TestFileAppend   hadoop.hdfs.TestDFSUpgradeFromImage   hadoop.hdfs.server.datanode.TestBlockScanner JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150   org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.fs.TestWebHdfsFileContextMainOperations   org.apache.hadoop.fs.permission.TestStickyBit JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.blockmanagement.TestPendingReplication   hadoop.hdfs.server.blockmanagement.TestReplicationPolicy   hadoop.hdfs.TestRecoverStripedFile   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784365/HDFS-9701.02.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux a702216ae3f5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 45c763a Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14243/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14243/console This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          The findbugs warning helped me thinking of a cleaner way to fix this.
          Patch 3 still goes with option 1 above, but instead of moving the close-and-wait logic all the way out to DataNode, I just moved it to FsDatasetImpl, and use wait instead of Thread.sleep to prevent locking on the FsDatasetImpl object. Please review, thanks.

          Show
          xiaochen Xiao Chen added a comment - The findbugs warning helped me thinking of a cleaner way to fix this. Patch 3 still goes with option 1 above, but instead of moving the close-and-wait logic all the way out to DataNode , I just moved it to FsDatasetImpl , and use wait instead of Thread.sleep to prevent locking on the FsDatasetImpl object. Please review, thanks.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
          +1 mvninstall 7m 36s trunk passed
          +1 compile 0m 39s trunk passed with JDK v1.8.0_66
          +1 compile 0m 42s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 14s trunk passed
          +1 findbugs 1m 55s trunk passed
          +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 46s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 39s the patch passed with JDK v1.8.0_66
          +1 javac 0m 39s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.7.0_91
          +1 javac 0m 38s the patch passed
          -1 checkstyle 0m 19s hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 134 unchanged - 1 fixed = 135 total (was 135)
          +1 mvnsite 0m 48s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 4s the patch passed
          +1 javadoc 1m 3s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 42s the patch passed with JDK v1.7.0_91
          -1 unit 51m 42s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 50m 22s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 20s Patch does not generate ASF License warnings.
          128m 6s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.namenode.TestCacheDirectives
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.TestRollingUpgrade
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestRecoverStripedFile



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784453/HDFS-9701.03.patch
          JIRA Issue HDFS-9701
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 759cc5ff23fc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d0d7c22
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14247/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14247/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 7m 36s trunk passed +1 compile 0m 39s trunk passed with JDK v1.8.0_66 +1 compile 0m 42s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 46s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 46s the patch passed +1 compile 0m 39s the patch passed with JDK v1.8.0_66 +1 javac 0m 39s the patch passed +1 compile 0m 38s the patch passed with JDK v1.7.0_91 +1 javac 0m 38s the patch passed -1 checkstyle 0m 19s hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 134 unchanged - 1 fixed = 135 total (was 135) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 4s the patch passed +1 javadoc 1m 3s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 42s the patch passed with JDK v1.7.0_91 -1 unit 51m 42s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 50m 22s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 128m 6s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.namenode.TestCacheDirectives   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.TestRollingUpgrade JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestRecoverStripedFile Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784453/HDFS-9701.03.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 759cc5ff23fc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d0d7c22 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14247/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14247/console This message was automatically generated.
          Hide
          eddyxu Lei (Eddy) Xu added a comment -

          Hi, Xiao Chen

          Thanks a lot for digging into this bug. The patch looks good in general.

          A few minor issues:

          • Can you move TestDatanodeImpl#testRemoveVolumeBeingWritten to TestDataNodeHotSwapVolumes?
            Also you dont need two try here:
          try {
                    try (ReplicaHandler replica =
                        dataset.createRbw(StorageType.DEFAULT, eb, false)) {
          
          • Could you verify that during removing a volume, block report can be sent in the test.
          • In FsDatasetImpl#removeVolume, {{LOG.info("All volumes are removed"); }} is not accurate. Only one volume is removed at a time, and the removal operation is not completed (i.e., ReplicaInfo are stored in DN's memory). Would you mind to rephrase that?
          • Realized that FsVolumeList#checkDirs and FsDatasetImpl#checkDirs share the same "waiting for checkVolumeRemoved()" code, would you mind to put them into one function?

          Thanks!

          Show
          eddyxu Lei (Eddy) Xu added a comment - Hi, Xiao Chen Thanks a lot for digging into this bug. The patch looks good in general. A few minor issues: Can you move TestDatanodeImpl#testRemoveVolumeBeingWritten to TestDataNodeHotSwapVolumes ? Also you dont need two try here: try { try (ReplicaHandler replica = dataset.createRbw(StorageType.DEFAULT, eb, false )) { Could you verify that during removing a volume, block report can be sent in the test. In FsDatasetImpl#removeVolume , {{LOG.info("All volumes are removed"); }} is not accurate. Only one volume is removed at a time, and the removal operation is not completed (i.e., ReplicaInfo are stored in DN's memory). Would you mind to rephrase that? Realized that FsVolumeList#checkDirs and FsDatasetImpl#checkDirs share the same "waiting for checkVolumeRemoved()" code, would you mind to put them into one function? Thanks!
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Lei (Eddy) Xu so much for the review!
          All your comments above are addressed in patch 4, except:

          Can you move TestDatanodeImpl#testRemoveVolumeBeingWritten to TestDataNodeHotSwapVolumes?

          I put it in TestFsDatasetImpl because we need to call methods (e.g. createRbw, finalizeBlock) directly on FsDatasetImpl to control the timing of the 'holding reference but not finalized yet' scenario. Putting it into TestDataNodeHotSwapVolumes will be harder to test. I understand your concern that this is a hot swap related test, but it's also on FsDatasetImpl. I'll try moving it if you insist.

          Could you verify that during removing a volume, block report can be sent in the test.

          I added a block report call in patch 4, though not strictly controlled it to be in the middle of removing - I'm not sure how to let it run exactly when the remove is in waitVolumeRemoved step.... please advice.

          Show
          xiaochen Xiao Chen added a comment - Thanks Lei (Eddy) Xu so much for the review! All your comments above are addressed in patch 4, except: Can you move TestDatanodeImpl#testRemoveVolumeBeingWritten to TestDataNodeHotSwapVolumes? I put it in TestFsDatasetImpl because we need to call methods (e.g. createRbw , finalizeBlock ) directly on FsDatasetImpl to control the timing of the 'holding reference but not finalized yet' scenario. Putting it into TestDataNodeHotSwapVolumes will be harder to test. I understand your concern that this is a hot swap related test, but it's also on FsDatasetImpl. I'll try moving it if you insist. Could you verify that during removing a volume, block report can be sent in the test. I added a block report call in patch 4, though not strictly controlled it to be in the middle of removing - I'm not sure how to let it run exactly when the remove is in waitVolumeRemoved step.... please advice.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 34s trunk passed
          +1 compile 0m 39s trunk passed with JDK v1.8.0_66
          +1 compile 0m 41s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 22s trunk passed
          +1 mvnsite 0m 51s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 54s trunk passed
          +1 javadoc 1m 7s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 45s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 38s the patch passed with JDK v1.8.0_66
          +1 javac 0m 38s the patch passed
          +1 compile 0m 40s the patch passed with JDK v1.7.0_91
          +1 javac 0m 40s the patch passed
          +1 checkstyle 0m 20s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 134 unchanged - 1 fixed = 134 total (was 135)
          +1 mvnsite 0m 48s the patch passed
          +1 mvneclipse 0m 11s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 7s the patch passed
          +1 javadoc 1m 2s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 44s the patch passed with JDK v1.7.0_91
          -1 unit 52m 44s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          +1 unit 49m 59s hadoop-hdfs in the patch passed with JDK v1.7.0_91.
          +1 asflicense 0m 19s Patch does not generate ASF License warnings.
          128m 45s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeMetrics



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784542/HDFS-9701.04.patch
          JIRA Issue HDFS-9701
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 5842b7933a9b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / d323639
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14253/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14253/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14253/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 77MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14253/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 34s trunk passed +1 compile 0m 39s trunk passed with JDK v1.8.0_66 +1 compile 0m 41s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 54s trunk passed +1 javadoc 1m 7s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 45s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 46s the patch passed +1 compile 0m 38s the patch passed with JDK v1.8.0_66 +1 javac 0m 38s the patch passed +1 compile 0m 40s the patch passed with JDK v1.7.0_91 +1 javac 0m 40s the patch passed +1 checkstyle 0m 20s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 134 unchanged - 1 fixed = 134 total (was 135) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 7s the patch passed +1 javadoc 1m 2s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 44s the patch passed with JDK v1.7.0_91 -1 unit 52m 44s hadoop-hdfs in the patch failed with JDK v1.8.0_66. +1 unit 49m 59s hadoop-hdfs in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 128m 45s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeMetrics Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784542/HDFS-9701.04.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5842b7933a9b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d323639 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14253/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14253/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14253/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14253/console This message was automatically generated.
          Hide
          vinayrpet Vinayakumar B added a comment -

          One issue I am seeing here is in FSVolumeList#checkDirs()

             private final VolumeChoosingPolicy<FsVolumeImpl> blockChooser;
          @@ -257,10 +260,33 @@ public void run() {
                       + " failure volumes.");
                 }
           
          +      waitVolumeRemoved(5000, this);
                 return failedVols;
               }
             }

          I believe, this call will get, IllegalMonitorStateException.
          checkDirsMutex should be passed instead of this

          Show
          vinayrpet Vinayakumar B added a comment - One issue I am seeing here is in FSVolumeList#checkDirs() private final VolumeChoosingPolicy<FsVolumeImpl> blockChooser; @@ -257,10 +260,33 @@ public void run() { + " failure volumes." ); } + waitVolumeRemoved(5000, this ); return failedVols; } } I believe, this call will get, IllegalMonitorStateException . checkDirsMutex should be passed instead of this
          Hide
          xiaochen Xiao Chen added a comment -

          Ah Good catch Vinayakumar B! And thanks for reviewing. I added the second parameter to the method solely for that reason, yet I didn't pass in the checkDirsMutex....
          Patch 5 fixes this.

          Show
          xiaochen Xiao Chen added a comment - Ah Good catch Vinayakumar B ! And thanks for reviewing. I added the second parameter to the method solely for that reason, yet I didn't pass in the checkDirsMutex .... Patch 5 fixes this.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 55s trunk passed
          +1 compile 0m 42s trunk passed with JDK v1.8.0_66
          +1 compile 0m 43s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 23s trunk passed
          +1 mvnsite 0m 54s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 2m 1s trunk passed
          +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66
          +1 javadoc 1m 51s trunk passed with JDK v1.7.0_91
          +1 mvninstall 0m 48s the patch passed
          +1 compile 0m 40s the patch passed with JDK v1.8.0_66
          +1 javac 0m 40s the patch passed
          +1 compile 0m 41s the patch passed with JDK v1.7.0_91
          +1 javac 0m 41s the patch passed
          +1 checkstyle 0m 21s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 134 unchanged - 1 fixed = 134 total (was 135)
          +1 mvnsite 0m 51s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 2m 16s the patch passed
          +1 javadoc 1m 8s the patch passed with JDK v1.8.0_66
          +1 javadoc 1m 50s the patch passed with JDK v1.7.0_91
          -1 unit 66m 12s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 61m 49s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          +1 asflicense 0m 24s Patch does not generate ASF License warnings.
          155m 34s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
            hadoop.fs.viewfs.TestViewFileSystemHdfs
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.shortcircuit.TestShortCircuitCache



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784678/HDFS-9701.05.patch
          JIRA Issue HDFS-9701
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux a2b6bce1d0de 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 97056c3
          Default Java 1.7.0_91
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14260/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 76MB
          Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14260/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 55s trunk passed +1 compile 0m 42s trunk passed with JDK v1.8.0_66 +1 compile 0m 43s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 2m 1s trunk passed +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 51s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 48s the patch passed +1 compile 0m 40s the patch passed with JDK v1.8.0_66 +1 javac 0m 40s the patch passed +1 compile 0m 41s the patch passed with JDK v1.7.0_91 +1 javac 0m 41s the patch passed +1 checkstyle 0m 21s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 134 unchanged - 1 fixed = 134 total (was 135) +1 mvnsite 0m 51s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 16s the patch passed +1 javadoc 1m 8s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 50s the patch passed with JDK v1.7.0_91 -1 unit 66m 12s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 61m 49s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 24s Patch does not generate ASF License warnings. 155m 34s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation   hadoop.fs.viewfs.TestViewFileSystemHdfs JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.shortcircuit.TestShortCircuitCache Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784678/HDFS-9701.05.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux a2b6bce1d0de 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 97056c3 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14260/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14260/console This message was automatically generated.
          Hide
          xiaochen Xiao Chen added a comment -

          Failed tests seem unrelated and are known to be flaky (e.g. HDFS-9466)

          Show
          xiaochen Xiao Chen added a comment - Failed tests seem unrelated and are known to be flaky (e.g. HDFS-9466 )
          Hide
          xiaobingo Xiaobing Zhou added a comment -

          Thanks Xiao Chen for the work. Could you explain why the step#2 is true?

          Reconfigure task locks on the FsDatasetImpl object
          

          It looks like Reconfigure task only locks on a member lock ReconfigurableBase#reconfigLock.

          Show
          xiaobingo Xiaobing Zhou added a comment - Thanks Xiao Chen for the work. Could you explain why the step#2 is true? Reconfigure task locks on the FsDatasetImpl object It looks like Reconfigure task only locks on a member lock ReconfigurableBase#reconfigLock.
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks Xiaobing Zhou for looking at this. We actually have multiple locks along the reconfig stack.
          In step 2, the locking I was referring to is inside FsDatasetImpl. Specifically this line. Please see my first comment above for the stack trace.

          Show
          xiaochen Xiao Chen added a comment - Thanks Xiaobing Zhou for looking at this. We actually have multiple locks along the reconfig stack. In step 2, the locking I was referring to is inside FsDatasetImpl. Specifically this line . Please see my first comment above for the stack trace.
          Hide
          xiaobingo Xiaobing Zhou added a comment -

          Thanks, that lock is indeed in the call stack of reconfig. The patch looks good to me, +1.

          Show
          xiaobingo Xiaobing Zhou added a comment - Thanks, that lock is indeed in the call stack of reconfig. The patch looks good to me, +1.
          Hide
          eddyxu Lei (Eddy) Xu added a comment -

          +1. Good work, Xiao Chen. Thanks a lot!

          Committed to branch-2.8, branch-2 and trunk.

          Show
          eddyxu Lei (Eddy) Xu added a comment - +1. Good work, Xiao Chen . Thanks a lot! Committed to branch-2.8, branch-2 and trunk.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #9217 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9217/)
          HDFS-9701. DN may deadlock when hot-swapping under load. (Xiao Chen via (lei: rev e50aa53eed3d0ff1bc8fe60381524bb3bbe53bc1)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsVolumeList.java
          • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9217 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9217/ ) HDFS-9701 . DN may deadlock when hot-swapping under load. (Xiao Chen via (lei: rev e50aa53eed3d0ff1bc8fe60381524bb3bbe53bc1) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsVolumeList.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
          Hide
          xiaochen Xiao Chen added a comment -

          Thanks for the review and commit Eddy, and thanks Vinay and Xiaobing for the review!

          Show
          xiaochen Xiao Chen added a comment - Thanks for the review and commit Eddy, and thanks Vinay and Xiaobing for the review!

            People

            • Assignee:
              xiaochen Xiao Chen
              Reporter:
              xiaochen Xiao Chen
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development