Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9701

DN may deadlock when hot-swapping under load

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • None
    • None
    • Reviewed

    Description

      If the DN is under load (new blocks being written), a hot-swap task by hdfs dfsadmin -reconfig may cause a dead lock.

      Attachments

        1. HDFS-9701.01.patch
          17 kB
          Xiao Chen
        2. HDFS-9701.02.patch
          20 kB
          Xiao Chen
        3. HDFS-9701.03.patch
          15 kB
          Xiao Chen
        4. HDFS-9701.04.patch
          15 kB
          Xiao Chen
        5. HDFS-9701.05.patch
          15 kB
          Xiao Chen

        Issue Links

          Activity

            xiaochen Xiao Chen added a comment -

            Most notable jstacks:
            Reconfigure task:

            "Reconfiguration Task" #459 daemon prio=5 os_prio=0 tid=0x00007fc6913a6000 nid=0x5219 waiting on condition [0x00007fc663cde000]
               java.lang.Thread.State: TIMED_WAITING (sleeping)
            	at java.lang.Thread.sleep(Native Method)
            	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.closeAndWait(FsVolumeImpl.java:251)
            	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:322)
            	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:363)
            	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:472)
            	- locked <0x00000000d6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
            	at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:718)
            	- locked <0x00000000d55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode)
            	at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:684)
            	at org.apache.hadoop.hdfs.server.datanode.DataNode.refreshVolumes(DataNode.java:648)
            	- locked <0x00000000d55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode)
            	at org.apache.hadoop.hdfs.server.datanode.DataNode.reconfigurePropertyImpl(DataNode.java:485)
            	at org.apache.hadoop.conf.ReconfigurableBase$ReconfigurationThread.run(ReconfigurableBase.java:133)
            

            Being written thread:

            "PacketResponder: BP-284727513-10.64.40.36-1450767058747:blk_1073785044_44298, type=HAS_DOWNSTREAM_IN_PIPELINE" #462 daemon prio=5 os_prio=0 tid=0x00007fc67c5c8000 nid=0x5268 waiting for monitor entry [0x00007fc662ed2000]
               java.lang.Thread.State: BLOCKED (on object monitor)
            	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1487)
            	- waiting to lock <0x00000000d6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
            	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1300)
            	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1257)
            	at java.lang.Thread.run(Thread.java:745)
            

            The deadlock happens between a lock and a reference count waiting:

            1. in BlockReceiver$PacketResponder#finalizeBlock, reference is increased after claimReplicaHandler. (Code here)
            2. Reconfigure task locks on the FsDatasetImpl object
            3. Reconfigure task calls all the way into FsVolumeImpl#closeAndWait, infinite loop waiting on reference count (Code here)
            4. BlockReceiver$PacketResponder#finalizeBlock waits on the FsDatasetImpl object's lock in step #2. Oops.
            xiaochen Xiao Chen added a comment - Most notable jstacks: Reconfigure task: "Reconfiguration Task" #459 daemon prio=5 os_prio=0 tid=0x00007fc6913a6000 nid=0x5219 waiting on condition [0x00007fc663cde000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.closeAndWait(FsVolumeImpl.java:251) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:322) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.removeVolume(FsVolumeList.java:363) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:472) - locked <0x00000000d6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:718) - locked <0x00000000d55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode) at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:684) at org.apache.hadoop.hdfs.server.datanode.DataNode.refreshVolumes(DataNode.java:648) - locked <0x00000000d55a5950> (a org.apache.hadoop.hdfs.server.datanode.DataNode) at org.apache.hadoop.hdfs.server.datanode.DataNode.reconfigurePropertyImpl(DataNode.java:485) at org.apache.hadoop.conf.ReconfigurableBase$ReconfigurationThread.run(ReconfigurableBase.java:133) Being written thread: "PacketResponder: BP-284727513-10.64.40.36-1450767058747:blk_1073785044_44298, type=HAS_DOWNSTREAM_IN_PIPELINE" #462 daemon prio=5 os_prio=0 tid=0x00007fc67c5c8000 nid=0x5268 waiting for monitor entry [0x00007fc662ed2000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1487) - waiting to lock <0x00000000d6057410> (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.finalizeBlock(BlockReceiver.java:1300) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1257) at java.lang.Thread.run(Thread.java:745) The deadlock happens between a lock and a reference count waiting: in BlockReceiver$PacketResponder#finalizeBlock , reference is increased after claimReplicaHandler . (Code here ) Reconfigure task locks on the FsDatasetImpl object Reconfigure task calls all the way into FsVolumeImpl#closeAndWait , infinite loop waiting on reference count (Code here ) BlockReceiver$PacketResponder#finalizeBlock waits on the FsDatasetImpl object's lock in step #2. Oops.
            xiaochen Xiao Chen added a comment -

            Thanks a lot to eddyxu for the offline discussion about the general ideas!

            To fix the problem, we have several ways:
            1. Don't do infinite wait in FsVolumeImpl#closeAndWait, wait outside of the lock scope
            2. Use finer-grained locks on the volumes in FsDatasetImpl.

            I think option 1 is better since the change is smaller, and the infinite wait inside seems a bit scary to me.
            Patch 1 attempts to solve the problem along option 1.

            • Moved the wait-for-close logic to outside of the FsDatasetImpl, into DataNode.
            • Had to add a new interface to FsDatasetSpi
            • Added methods along the call stack to allow the above
            • Added a new unit test in TestFsDatasetImpl that locks before the patch and passes after
            • Had to modify the TestFsVolumeList to accommodate the change
            • Added more info into the log in BlockReceiver which I found useful when root causing the problem.
            • Added a missing @Override in FsDatasetImpl
            xiaochen Xiao Chen added a comment - Thanks a lot to eddyxu for the offline discussion about the general ideas! To fix the problem, we have several ways: 1. Don't do infinite wait in FsVolumeImpl#closeAndWait , wait outside of the lock scope 2. Use finer-grained locks on the volumes in FsDatasetImpl. I think option 1 is better since the change is smaller, and the infinite wait inside seems a bit scary to me. Patch 1 attempts to solve the problem along option 1. Moved the wait-for-close logic to outside of the FsDatasetImpl , into DataNode . Had to add a new interface to FsDatasetSpi Added methods along the call stack to allow the above Added a new unit test in TestFsDatasetImpl that locks before the patch and passes after Had to modify the TestFsVolumeList to accommodate the change Added more info into the log in BlockReceiver which I found useful when root causing the problem. Added a missing @Override in FsDatasetImpl
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 0s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 4 new or modified test files.
            +1 mvninstall 10m 49s trunk passed
            +1 compile 1m 9s trunk passed with JDK v1.8.0_66
            +1 compile 0m 56s trunk passed with JDK v1.7.0_91
            +1 checkstyle 0m 28s trunk passed
            +1 mvnsite 1m 13s trunk passed
            +1 mvneclipse 0m 19s trunk passed
            +1 findbugs 2m 34s trunk passed
            +1 javadoc 1m 40s trunk passed with JDK v1.8.0_66
            +1 javadoc 2m 39s trunk passed with JDK v1.7.0_91
            +1 mvninstall 1m 4s the patch passed
            +1 compile 1m 5s the patch passed with JDK v1.8.0_66
            +1 javac 1m 5s the patch passed
            +1 compile 0m 55s the patch passed with JDK v1.7.0_91
            +1 javac 0m 55s the patch passed
            -1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 335 unchanged - 1 fixed = 337 total (was 336)
            +1 mvnsite 1m 8s the patch passed
            +1 mvneclipse 0m 16s the patch passed
            +1 whitespace 0m 0s Patch has no whitespace issues.
            -1 findbugs 2m 49s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
            +1 javadoc 1m 38s the patch passed with JDK v1.8.0_66
            +1 javadoc 2m 33s the patch passed with JDK v1.7.0_91
            -1 unit 85m 18s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
            -1 unit 76m 53s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
            +1 asflicense 0m 29s Patch does not generate ASF License warnings.
            199m 57s



            Reason Tests
            FindBugs module:hadoop-hdfs-project/hadoop-hdfs
              org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(Set, boolean) calls Thread.sleep() with a lock held At DataNode.java:a lock held At DataNode.java:[line 805]
            JDK v1.8.0_66 Failed junit tests hadoop.hdfs.qjournal.client.TestQuorumJournalManager
              hadoop.hdfs.TestDFSClientRetries
              hadoop.hdfs.TestRecoverStripedFile
              hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
              hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
              hadoop.hdfs.server.datanode.TestBlockScanner
              hadoop.hdfs.server.datanode.TestDirectoryScanner
              hadoop.hdfs.security.TestDelegationTokenForProxyUser
            JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
              hadoop.hdfs.server.blockmanagement.TestBlockManager
              hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
              hadoop.hdfs.server.datanode.TestBlockScanner
              hadoop.hdfs.server.datanode.TestDirectoryScanner



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:0ca8df7
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784254/HDFS-9701.01.patch
            JIRA Issue HDFS-9701
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux edd0a2a534e4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / d62b4a4
            Default Java 1.7.0_91
            Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
            findbugs v3.0.0
            checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
            findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
            unit https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
            unit https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
            unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
            JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14238/testReport/
            modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
            Max memory used 76MB
            Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
            Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14238/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 4 new or modified test files. +1 mvninstall 10m 49s trunk passed +1 compile 1m 9s trunk passed with JDK v1.8.0_66 +1 compile 0m 56s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 28s trunk passed +1 mvnsite 1m 13s trunk passed +1 mvneclipse 0m 19s trunk passed +1 findbugs 2m 34s trunk passed +1 javadoc 1m 40s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 39s trunk passed with JDK v1.7.0_91 +1 mvninstall 1m 4s the patch passed +1 compile 1m 5s the patch passed with JDK v1.8.0_66 +1 javac 1m 5s the patch passed +1 compile 0m 55s the patch passed with JDK v1.7.0_91 +1 javac 0m 55s the patch passed -1 checkstyle 0m 27s hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 335 unchanged - 1 fixed = 337 total (was 336) +1 mvnsite 1m 8s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. -1 findbugs 2m 49s hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) +1 javadoc 1m 38s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 33s the patch passed with JDK v1.7.0_91 -1 unit 85m 18s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 76m 53s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 29s Patch does not generate ASF License warnings. 199m 57s Reason Tests FindBugs module:hadoop-hdfs-project/hadoop-hdfs   org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(Set, boolean) calls Thread.sleep() with a lock held At DataNode.java:a lock held At DataNode.java: [line 805] JDK v1.8.0_66 Failed junit tests hadoop.hdfs.qjournal.client.TestQuorumJournalManager   hadoop.hdfs.TestDFSClientRetries   hadoop.hdfs.TestRecoverStripedFile   hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.hdfs.security.TestDelegationTokenForProxyUser JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   hadoop.hdfs.server.blockmanagement.TestBlockManager   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.datanode.TestDirectoryScanner Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784254/HDFS-9701.01.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux edd0a2a534e4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d62b4a4 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt findbugs https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html unit https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14238/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14238/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14238/console This message was automatically generated.
            xiaochen Xiao Chen added a comment -

            Patch 2 fixes checkstyle, findbugs issues, and added some javadocs.
            For failed tests, TestFsDatasetImpl and TestDataNodeHotSwapVolumes are related, others seems not.

            • TestFsDatasetImpl: original test missing cleanup. Added.
            • TestDataNodeHotSwapVolumes: IIUC, we should hflush first in order for the BlockReceiver to hold a ref count. Then we can verify that block reference is not removed because block not finalized, even if a reconfig task is launched. For this reason, I moved the barrier to fix the test. eddyxu please correct me if I'm wrong.
            xiaochen Xiao Chen added a comment - Patch 2 fixes checkstyle, findbugs issues, and added some javadocs. For failed tests, TestFsDatasetImpl and TestDataNodeHotSwapVolumes are related, others seems not. TestFsDatasetImpl : original test missing cleanup. Added. TestDataNodeHotSwapVolumes : IIUC, we should hflush first in order for the BlockReceiver to hold a ref count. Then we can verify that block reference is not removed because block not finalized, even if a reconfig task is launched. For this reason, I moved the barrier to fix the test. eddyxu please correct me if I'm wrong.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 0s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 5 new or modified test files.
            +1 mvninstall 10m 55s trunk passed
            +1 compile 1m 26s trunk passed with JDK v1.8.0_66
            +1 compile 1m 3s trunk passed with JDK v1.7.0_91
            +1 checkstyle 0m 33s trunk passed
            +1 mvnsite 1m 17s trunk passed
            +1 mvneclipse 0m 18s trunk passed
            +1 findbugs 2m 37s trunk passed
            +1 javadoc 1m 45s trunk passed with JDK v1.8.0_66
            +1 javadoc 2m 44s trunk passed with JDK v1.7.0_91
            +1 mvninstall 1m 10s the patch passed
            +1 compile 1m 21s the patch passed with JDK v1.8.0_66
            +1 javac 1m 21s the patch passed
            +1 compile 0m 59s the patch passed with JDK v1.7.0_91
            +1 javac 0m 59s the patch passed
            +1 checkstyle 0m 31s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 335 unchanged - 1 fixed = 335 total (was 336)
            +1 mvnsite 1m 13s the patch passed
            +1 mvneclipse 0m 16s the patch passed
            +1 whitespace 0m 0s Patch has no whitespace issues.
            +1 findbugs 2m 52s the patch passed
            +1 javadoc 1m 46s the patch passed with JDK v1.8.0_66
            +1 javadoc 2m 40s the patch passed with JDK v1.7.0_91
            -1 unit 76m 34s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
            -1 unit 65m 8s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
            -1 asflicense 0m 19s Patch generated 1 ASF License warnings.
            180m 57s



            Reason Tests
            JDK v1.8.0_66 Failed junit tests hadoop.hdfs.qjournal.TestSecureNNWithQJM
              hadoop.hdfs.server.namenode.ha.TestEditLogTailer
              hadoop.hdfs.security.TestDelegationTokenForProxyUser
              hadoop.hdfs.TestFileAppend
              hadoop.hdfs.TestDFSUpgradeFromImage
              hadoop.hdfs.server.datanode.TestBlockScanner
            JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150
              org.apache.hadoop.hdfs.TestWriteReadStripedFile
              org.apache.hadoop.fs.TestWebHdfsFileContextMainOperations
              org.apache.hadoop.fs.permission.TestStickyBit
            JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.balancer.TestBalancer
              hadoop.hdfs.server.blockmanagement.TestPendingReplication
              hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
              hadoop.hdfs.TestRecoverStripedFile
              hadoop.hdfs.server.namenode.TestNamenodeCapacityReport



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:0ca8df7
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784365/HDFS-9701.02.patch
            JIRA Issue HDFS-9701
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux a702216ae3f5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / 45c763a
            Default Java 1.7.0_91
            Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
            findbugs v3.0.0
            unit https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
            unit https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
            unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
            JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14243/testReport/
            asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-asflicense-problems.txt
            modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
            Max memory used 77MB
            Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
            Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14243/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 5 new or modified test files. +1 mvninstall 10m 55s trunk passed +1 compile 1m 26s trunk passed with JDK v1.8.0_66 +1 compile 1m 3s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 33s trunk passed +1 mvnsite 1m 17s trunk passed +1 mvneclipse 0m 18s trunk passed +1 findbugs 2m 37s trunk passed +1 javadoc 1m 45s trunk passed with JDK v1.8.0_66 +1 javadoc 2m 44s trunk passed with JDK v1.7.0_91 +1 mvninstall 1m 10s the patch passed +1 compile 1m 21s the patch passed with JDK v1.8.0_66 +1 javac 1m 21s the patch passed +1 compile 0m 59s the patch passed with JDK v1.7.0_91 +1 javac 0m 59s the patch passed +1 checkstyle 0m 31s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 335 unchanged - 1 fixed = 335 total (was 336) +1 mvnsite 1m 13s the patch passed +1 mvneclipse 0m 16s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 52s the patch passed +1 javadoc 1m 46s the patch passed with JDK v1.8.0_66 +1 javadoc 2m 40s the patch passed with JDK v1.7.0_91 -1 unit 76m 34s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 65m 8s hadoop-hdfs in the patch failed with JDK v1.7.0_91. -1 asflicense 0m 19s Patch generated 1 ASF License warnings. 180m 57s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.TestFileAppend   hadoop.hdfs.TestDFSUpgradeFromImage   hadoop.hdfs.server.datanode.TestBlockScanner JDK v1.8.0_66 Timed out junit tests org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150   org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.fs.TestWebHdfsFileContextMainOperations   org.apache.hadoop.fs.permission.TestStickyBit JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.server.blockmanagement.TestPendingReplication   hadoop.hdfs.server.blockmanagement.TestReplicationPolicy   hadoop.hdfs.TestRecoverStripedFile   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784365/HDFS-9701.02.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux a702216ae3f5 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 45c763a Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14243/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/14243/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14243/console This message was automatically generated.
            xiaochen Xiao Chen added a comment -

            The findbugs warning helped me thinking of a cleaner way to fix this.
            Patch 3 still goes with option 1 above, but instead of moving the close-and-wait logic all the way out to DataNode, I just moved it to FsDatasetImpl, and use wait instead of Thread.sleep to prevent locking on the FsDatasetImpl object. Please review, thanks.

            xiaochen Xiao Chen added a comment - The findbugs warning helped me thinking of a cleaner way to fix this. Patch 3 still goes with option 1 above, but instead of moving the close-and-wait logic all the way out to DataNode , I just moved it to FsDatasetImpl , and use wait instead of Thread.sleep to prevent locking on the FsDatasetImpl object. Please review, thanks.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 0s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 3 new or modified test files.
            +1 mvninstall 7m 36s trunk passed
            +1 compile 0m 39s trunk passed with JDK v1.8.0_66
            +1 compile 0m 42s trunk passed with JDK v1.7.0_91
            +1 checkstyle 0m 22s trunk passed
            +1 mvnsite 0m 51s trunk passed
            +1 mvneclipse 0m 14s trunk passed
            +1 findbugs 1m 55s trunk passed
            +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66
            +1 javadoc 1m 46s trunk passed with JDK v1.7.0_91
            +1 mvninstall 0m 46s the patch passed
            +1 compile 0m 39s the patch passed with JDK v1.8.0_66
            +1 javac 0m 39s the patch passed
            +1 compile 0m 38s the patch passed with JDK v1.7.0_91
            +1 javac 0m 38s the patch passed
            -1 checkstyle 0m 19s hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 134 unchanged - 1 fixed = 135 total (was 135)
            +1 mvnsite 0m 48s the patch passed
            +1 mvneclipse 0m 11s the patch passed
            +1 whitespace 0m 0s Patch has no whitespace issues.
            +1 findbugs 2m 4s the patch passed
            +1 javadoc 1m 3s the patch passed with JDK v1.8.0_66
            +1 javadoc 1m 42s the patch passed with JDK v1.7.0_91
            -1 unit 51m 42s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
            -1 unit 50m 22s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
            +1 asflicense 0m 20s Patch does not generate ASF License warnings.
            128m 6s



            Reason Tests
            JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.namenode.TestCacheDirectives
              hadoop.hdfs.server.datanode.TestBlockScanner
              hadoop.hdfs.TestRollingUpgrade
            JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestRecoverStripedFile



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:0ca8df7
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784453/HDFS-9701.03.patch
            JIRA Issue HDFS-9701
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux 759cc5ff23fc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / d0d7c22
            Default Java 1.7.0_91
            Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
            findbugs v3.0.0
            checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
            unit https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
            unit https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
            unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
            JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14247/testReport/
            modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
            Max memory used 77MB
            Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
            Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14247/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 3 new or modified test files. +1 mvninstall 7m 36s trunk passed +1 compile 0m 39s trunk passed with JDK v1.8.0_66 +1 compile 0m 42s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 55s trunk passed +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 46s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 46s the patch passed +1 compile 0m 39s the patch passed with JDK v1.8.0_66 +1 javac 0m 39s the patch passed +1 compile 0m 38s the patch passed with JDK v1.7.0_91 +1 javac 0m 38s the patch passed -1 checkstyle 0m 19s hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 134 unchanged - 1 fixed = 135 total (was 135) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 4s the patch passed +1 javadoc 1m 3s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 42s the patch passed with JDK v1.7.0_91 -1 unit 51m 42s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 50m 22s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 20s Patch does not generate ASF License warnings. 128m 6s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.namenode.TestCacheDirectives   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.TestRollingUpgrade JDK v1.7.0_91 Failed junit tests hadoop.hdfs.TestRecoverStripedFile Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784453/HDFS-9701.03.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 759cc5ff23fc 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d0d7c22 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14247/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14247/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14247/console This message was automatically generated.
            eddyxu Lei (Eddy) Xu added a comment -

            Hi, xiaochen

            Thanks a lot for digging into this bug. The patch looks good in general.

            A few minor issues:

            • Can you move TestDatanodeImpl#testRemoveVolumeBeingWritten to TestDataNodeHotSwapVolumes?
              Also you dont need two try here:
            try {
                      try (ReplicaHandler replica =
                          dataset.createRbw(StorageType.DEFAULT, eb, false)) {
            
            • Could you verify that during removing a volume, block report can be sent in the test.
            • In FsDatasetImpl#removeVolume, {{LOG.info("All volumes are removed"); }} is not accurate. Only one volume is removed at a time, and the removal operation is not completed (i.e., ReplicaInfo are stored in DN's memory). Would you mind to rephrase that?
            • Realized that FsVolumeList#checkDirs and FsDatasetImpl#checkDirs share the same "waiting for checkVolumeRemoved()" code, would you mind to put them into one function?

            Thanks!

            eddyxu Lei (Eddy) Xu added a comment - Hi, xiaochen Thanks a lot for digging into this bug. The patch looks good in general. A few minor issues: Can you move TestDatanodeImpl#testRemoveVolumeBeingWritten to TestDataNodeHotSwapVolumes ? Also you dont need two try here: try { try (ReplicaHandler replica = dataset.createRbw(StorageType.DEFAULT, eb, false )) { Could you verify that during removing a volume, block report can be sent in the test. In FsDatasetImpl#removeVolume , {{LOG.info("All volumes are removed"); }} is not accurate. Only one volume is removed at a time, and the removal operation is not completed (i.e., ReplicaInfo are stored in DN's memory). Would you mind to rephrase that? Realized that FsVolumeList#checkDirs and FsDatasetImpl#checkDirs share the same "waiting for checkVolumeRemoved()" code, would you mind to put them into one function? Thanks!
            xiaochen Xiao Chen added a comment -

            Thanks eddyxu so much for the review!
            All your comments above are addressed in patch 4, except:

            Can you move TestDatanodeImpl#testRemoveVolumeBeingWritten to TestDataNodeHotSwapVolumes?

            I put it in TestFsDatasetImpl because we need to call methods (e.g. createRbw, finalizeBlock) directly on FsDatasetImpl to control the timing of the 'holding reference but not finalized yet' scenario. Putting it into TestDataNodeHotSwapVolumes will be harder to test. I understand your concern that this is a hot swap related test, but it's also on FsDatasetImpl. I'll try moving it if you insist.

            Could you verify that during removing a volume, block report can be sent in the test.

            I added a block report call in patch 4, though not strictly controlled it to be in the middle of removing - I'm not sure how to let it run exactly when the remove is in waitVolumeRemoved step.... please advice.

            xiaochen Xiao Chen added a comment - Thanks eddyxu so much for the review! All your comments above are addressed in patch 4, except: Can you move TestDatanodeImpl#testRemoveVolumeBeingWritten to TestDataNodeHotSwapVolumes? I put it in TestFsDatasetImpl because we need to call methods (e.g. createRbw , finalizeBlock ) directly on FsDatasetImpl to control the timing of the 'holding reference but not finalized yet' scenario. Putting it into TestDataNodeHotSwapVolumes will be harder to test. I understand your concern that this is a hot swap related test, but it's also on FsDatasetImpl. I'll try moving it if you insist. Could you verify that during removing a volume, block report can be sent in the test. I added a block report call in patch 4, though not strictly controlled it to be in the middle of removing - I'm not sure how to let it run exactly when the remove is in waitVolumeRemoved step.... please advice.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 0s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
            +1 mvninstall 7m 34s trunk passed
            +1 compile 0m 39s trunk passed with JDK v1.8.0_66
            +1 compile 0m 41s trunk passed with JDK v1.7.0_91
            +1 checkstyle 0m 22s trunk passed
            +1 mvnsite 0m 51s trunk passed
            +1 mvneclipse 0m 13s trunk passed
            +1 findbugs 1m 54s trunk passed
            +1 javadoc 1m 7s trunk passed with JDK v1.8.0_66
            +1 javadoc 1m 45s trunk passed with JDK v1.7.0_91
            +1 mvninstall 0m 46s the patch passed
            +1 compile 0m 38s the patch passed with JDK v1.8.0_66
            +1 javac 0m 38s the patch passed
            +1 compile 0m 40s the patch passed with JDK v1.7.0_91
            +1 javac 0m 40s the patch passed
            +1 checkstyle 0m 20s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 134 unchanged - 1 fixed = 134 total (was 135)
            +1 mvnsite 0m 48s the patch passed
            +1 mvneclipse 0m 11s the patch passed
            +1 whitespace 0m 0s Patch has no whitespace issues.
            +1 findbugs 2m 7s the patch passed
            +1 javadoc 1m 2s the patch passed with JDK v1.8.0_66
            +1 javadoc 1m 44s the patch passed with JDK v1.7.0_91
            -1 unit 52m 44s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
            +1 unit 49m 59s hadoop-hdfs in the patch passed with JDK v1.7.0_91.
            +1 asflicense 0m 19s Patch does not generate ASF License warnings.
            128m 45s



            Reason Tests
            JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeMetrics



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:0ca8df7
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784542/HDFS-9701.04.patch
            JIRA Issue HDFS-9701
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux 5842b7933a9b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / d323639
            Default Java 1.7.0_91
            Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
            findbugs v3.0.0
            unit https://builds.apache.org/job/PreCommit-HDFS-Build/14253/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
            unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14253/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
            JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14253/testReport/
            modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
            Max memory used 77MB
            Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
            Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14253/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 34s trunk passed +1 compile 0m 39s trunk passed with JDK v1.8.0_66 +1 compile 0m 41s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 22s trunk passed +1 mvnsite 0m 51s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 54s trunk passed +1 javadoc 1m 7s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 45s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 46s the patch passed +1 compile 0m 38s the patch passed with JDK v1.8.0_66 +1 javac 0m 38s the patch passed +1 compile 0m 40s the patch passed with JDK v1.7.0_91 +1 javac 0m 40s the patch passed +1 checkstyle 0m 20s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 134 unchanged - 1 fixed = 134 total (was 135) +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 11s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 7s the patch passed +1 javadoc 1m 2s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 44s the patch passed with JDK v1.7.0_91 -1 unit 52m 44s hadoop-hdfs in the patch failed with JDK v1.8.0_66. +1 unit 49m 59s hadoop-hdfs in the patch passed with JDK v1.7.0_91. +1 asflicense 0m 19s Patch does not generate ASF License warnings. 128m 45s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeMetrics Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784542/HDFS-9701.04.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 5842b7933a9b 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / d323639 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14253/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14253/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14253/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 77MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14253/console This message was automatically generated.
            vinayakumarb Vinayakumar B added a comment -

            One issue I am seeing here is in FSVolumeList#checkDirs()

               private final VolumeChoosingPolicy<FsVolumeImpl> blockChooser;
            @@ -257,10 +260,33 @@ public void run() {
                         + " failure volumes.");
                   }
             
            +      waitVolumeRemoved(5000, this);
                   return failedVols;
                 }
               }

            I believe, this call will get, IllegalMonitorStateException.
            checkDirsMutex should be passed instead of this

            vinayakumarb Vinayakumar B added a comment - One issue I am seeing here is in FSVolumeList#checkDirs() private final VolumeChoosingPolicy<FsVolumeImpl> blockChooser; @@ -257,10 +260,33 @@ public void run() { + " failure volumes." ); } + waitVolumeRemoved(5000, this ); return failedVols; } } I believe, this call will get, IllegalMonitorStateException . checkDirsMutex should be passed instead of this
            xiaochen Xiao Chen added a comment -

            Ah Good catch vinayrpet! And thanks for reviewing. I added the second parameter to the method solely for that reason, yet I didn't pass in the checkDirsMutex....
            Patch 5 fixes this.

            xiaochen Xiao Chen added a comment - Ah Good catch vinayrpet ! And thanks for reviewing. I added the second parameter to the method solely for that reason, yet I didn't pass in the checkDirsMutex .... Patch 5 fixes this.
            hadoopqa Hadoop QA added a comment -
            -1 overall



            Vote Subsystem Runtime Comment
            0 reexec 0m 0s Docker mode activated.
            +1 @author 0m 0s The patch does not contain any @author tags.
            +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
            +1 mvninstall 7m 55s trunk passed
            +1 compile 0m 42s trunk passed with JDK v1.8.0_66
            +1 compile 0m 43s trunk passed with JDK v1.7.0_91
            +1 checkstyle 0m 23s trunk passed
            +1 mvnsite 0m 54s trunk passed
            +1 mvneclipse 0m 13s trunk passed
            +1 findbugs 2m 1s trunk passed
            +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66
            +1 javadoc 1m 51s trunk passed with JDK v1.7.0_91
            +1 mvninstall 0m 48s the patch passed
            +1 compile 0m 40s the patch passed with JDK v1.8.0_66
            +1 javac 0m 40s the patch passed
            +1 compile 0m 41s the patch passed with JDK v1.7.0_91
            +1 javac 0m 41s the patch passed
            +1 checkstyle 0m 21s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 134 unchanged - 1 fixed = 134 total (was 135)
            +1 mvnsite 0m 51s the patch passed
            +1 mvneclipse 0m 12s the patch passed
            +1 whitespace 0m 0s Patch has no whitespace issues.
            +1 findbugs 2m 16s the patch passed
            +1 javadoc 1m 8s the patch passed with JDK v1.8.0_66
            +1 javadoc 1m 50s the patch passed with JDK v1.7.0_91
            -1 unit 66m 12s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
            -1 unit 61m 49s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
            +1 asflicense 0m 24s Patch does not generate ASF License warnings.
            155m 34s



            Reason Tests
            JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
              hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation
              hadoop.fs.viewfs.TestViewFileSystemHdfs
            JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner
              hadoop.hdfs.shortcircuit.TestShortCircuitCache



            Subsystem Report/Notes
            Docker Image:yetus/hadoop:0ca8df7
            JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784678/HDFS-9701.05.patch
            JIRA Issue HDFS-9701
            Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
            uname Linux a2b6bce1d0de 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
            Build tool maven
            Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
            git revision trunk / 97056c3
            Default Java 1.7.0_91
            Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91
            findbugs v3.0.0
            unit https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
            unit https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
            unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
            JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14260/testReport/
            modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
            Max memory used 76MB
            Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org
            Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14260/console

            This message was automatically generated.

            hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 55s trunk passed +1 compile 0m 42s trunk passed with JDK v1.8.0_66 +1 compile 0m 43s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 23s trunk passed +1 mvnsite 0m 54s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 2m 1s trunk passed +1 javadoc 1m 6s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 51s trunk passed with JDK v1.7.0_91 +1 mvninstall 0m 48s the patch passed +1 compile 0m 40s the patch passed with JDK v1.8.0_66 +1 javac 0m 40s the patch passed +1 compile 0m 41s the patch passed with JDK v1.7.0_91 +1 javac 0m 41s the patch passed +1 checkstyle 0m 21s hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 134 unchanged - 1 fixed = 134 total (was 135) +1 mvnsite 0m 51s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 16s the patch passed +1 javadoc 1m 8s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 50s the patch passed with JDK v1.7.0_91 -1 unit 66m 12s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 61m 49s hadoop-hdfs in the patch failed with JDK v1.7.0_91. +1 asflicense 0m 24s Patch does not generate ASF License warnings. 155m 34s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation   hadoop.fs.viewfs.TestViewFileSystemHdfs JDK v1.7.0_91 Failed junit tests hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.shortcircuit.TestShortCircuitCache Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12784678/HDFS-9701.05.patch JIRA Issue HDFS-9701 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux a2b6bce1d0de 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 97056c3 Default Java 1.7.0_91 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_66 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_91 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/14260/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/14260/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 76MB Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14260/console This message was automatically generated.
            xiaochen Xiao Chen added a comment -

            Failed tests seem unrelated and are known to be flaky (e.g. HDFS-9466)

            xiaochen Xiao Chen added a comment - Failed tests seem unrelated and are known to be flaky (e.g. HDFS-9466 )
            xiaobingo Xiaobing Zhou added a comment -

            Thanks xiaochen for the work. Could you explain why the step#2 is true?

            Reconfigure task locks on the FsDatasetImpl object
            

            It looks like Reconfigure task only locks on a member lock ReconfigurableBase#reconfigLock.

            xiaobingo Xiaobing Zhou added a comment - Thanks xiaochen for the work. Could you explain why the step#2 is true? Reconfigure task locks on the FsDatasetImpl object It looks like Reconfigure task only locks on a member lock ReconfigurableBase#reconfigLock.
            xiaochen Xiao Chen added a comment -

            Thanks xiaobingo for looking at this. We actually have multiple locks along the reconfig stack.
            In step 2, the locking I was referring to is inside FsDatasetImpl. Specifically this line. Please see my first comment above for the stack trace.

            xiaochen Xiao Chen added a comment - Thanks xiaobingo for looking at this. We actually have multiple locks along the reconfig stack. In step 2, the locking I was referring to is inside FsDatasetImpl. Specifically this line . Please see my first comment above for the stack trace.
            xiaobingo Xiaobing Zhou added a comment -

            Thanks, that lock is indeed in the call stack of reconfig. The patch looks good to me, +1.

            xiaobingo Xiaobing Zhou added a comment - Thanks, that lock is indeed in the call stack of reconfig. The patch looks good to me, +1.
            eddyxu Lei (Eddy) Xu added a comment -

            +1. Good work, xiaochen. Thanks a lot!

            Committed to branch-2.8, branch-2 and trunk.

            eddyxu Lei (Eddy) Xu added a comment - +1. Good work, xiaochen . Thanks a lot! Committed to branch-2.8, branch-2 and trunk.
            hudson Hudson added a comment -

            FAILURE: Integrated in Hadoop-trunk-Commit #9217 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9217/)
            HDFS-9701. DN may deadlock when hot-swapping under load. (Xiao Chen via (lei: rev e50aa53eed3d0ff1bc8fe60381524bb3bbe53bc1)

            • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
            • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
            • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
            • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java
            • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsVolumeList.java
            • hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
            hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #9217 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9217/ ) HDFS-9701 . DN may deadlock when hot-swapping under load. (Xiao Chen via (lei: rev e50aa53eed3d0ff1bc8fe60381524bb3bbe53bc1) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsVolumeList.java hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
            xiaochen Xiao Chen added a comment -

            Thanks for the review and commit Eddy, and thanks Vinay and Xiaobing for the review!

            xiaochen Xiao Chen added a comment - Thanks for the review and commit Eddy, and thanks Vinay and Xiaobing for the review!

            People

              xiaochen Xiao Chen
              xiaochen Xiao Chen
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: