Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9500

datanodesSoftwareVersions map may counting wrong when rolling upgrade

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1, 2.6.2
    • Fix Version/s: 2.8.0, 2.7.4, 3.0.0-alpha2
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      While rolling upgrading, namenode's website overview will report there are two versions datanodes in the cluster, for example, 2.6.0 has x nodes and 2.6.2 has y nodes. However, sometimes when I stop a datanode in old version and start a new version one, namenode only increases the number of new version but not decreases the number of old version. So the total number x+y will be larger than the number of datanodes. Even all datanodes are upgraded, there will still have the messages that there are several datanode in old version. And I must run hdfs dfsadmin -refreshNodes to clear this message.

      I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in old version is not alive because of shutting down, it will not pass shouldCountVersion, so the number of old version won't be decreased. But this method only judges the status of heartbeat and isAlive at that moment, if namenode has not removed this node which will decrease the version map and this node restarts in the new version, the decrementVersionCount belongs to this node will never be executed.

      So the simplest way to fix this is that we always recounting the version map in registerDatanode since it is not a heavy operation.

      1. 9500-v1.patch
        2 kB
        Phil Yang
      2. HDFS-9500.000.patch
        3 kB
        Erik Krogen
      3. HDFS-9500.001.patch
        4 kB
        Erik Krogen
      4. HDFS-9500.002.patch
        4 kB
        Erik Krogen

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10712 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10712/)
        HDFS-9500. Fix software version counts for DataNodes during rolling (shv: rev f3ac1f41b8fa82a0ac87a207d7afa2061d90a9bd)

        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
        • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10712 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10712/ ) HDFS-9500 . Fix software version counts for DataNodes during rolling (shv: rev f3ac1f41b8fa82a0ac87a207d7afa2061d90a9bd) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestDatanodeManager.java
        Hide
        shv Konstantin Shvachko added a comment -

        I just committed this to trunk, branch-2, -2.8 and -2.7.
        Thank you, Erik.

        Show
        shv Konstantin Shvachko added a comment - I just committed this to trunk, branch-2, -2.8 and -2.7. Thank you, Erik.
        Hide
        shv Konstantin Shvachko added a comment -

        Very strange. The build page on Jenkins says
        Test Result (no failures)
        Guess something went wrong with Jenkins reporting.

        Show
        shv Konstantin Shvachko added a comment - Very strange. The build page on Jenkins says Test Result (no failures) Guess something went wrong with Jenkins reporting.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 20s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 8m 40s trunk passed
        +1 compile 0m 58s trunk passed
        +1 checkstyle 0m 35s trunk passed
        +1 mvnsite 1m 8s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 2m 8s trunk passed
        +1 javadoc 0m 49s trunk passed
        +1 mvninstall 1m 0s the patch passed
        +1 compile 0m 56s the patch passed
        +1 javac 0m 56s the patch passed
        +1 checkstyle 0m 30s the patch passed
        +1 mvnsite 1m 5s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 2m 9s the patch passed
        +1 javadoc 0m 42s the patch passed
        -1 unit 42m 26s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 21s The patch does not generate ASF License warnings.
        65m 37s



        Reason Tests
        Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2
          org.apache.hadoop.hdfs.TestFileChecksum
          org.apache.hadoop.hdfs.TestParallelShortCircuitLegacyRead
          org.apache.hadoop.hdfs.server.namenode.TestStartup
          org.apache.hadoop.hdfs.TestWriteRead
          org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure
          org.apache.hadoop.fs.viewfs.TestViewFileSystemHdfs
          org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
          org.apache.hadoop.hdfs.TestFileCreationClient
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
          org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics
          org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
          org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
          org.apache.hadoop.hdfs.TestMaintenanceState
          org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
          org.apache.hadoop.hdfs.server.namenode.TestFileContextAcl
          org.apache.hadoop.fs.TestEnhancedByteBufferAccess
          org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete
          org.apache.hadoop.hdfs.TestDFSClientFailover
          org.apache.hadoop.hdfs.TestSetrepIncreasing
          org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery
          org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180
          org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement
          org.apache.hadoop.hdfs.server.namenode.TestNameNodeRecovery
          org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160
          org.apache.hadoop.hdfs.TestAclsEndToEnd
          org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
          org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM
          org.apache.hadoop.hdfs.TestFileCreation
          org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage
          org.apache.hadoop.hdfs.TestReplication
          org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
          org.apache.hadoop.hdfs.server.namenode.TestINodeFile
          org.apache.hadoop.hdfs.server.namenode.TestEditLog
          org.apache.hadoop.hdfs.server.namenode.TestNameNodeAcl
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeLifeline
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter
          org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer
          org.apache.hadoop.hdfs.TestSafeModeWithStripedFile
          org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl
          org.apache.hadoop.hdfs.TestReconstructStripedFile
          org.apache.hadoop.hdfs.TestWriteReadStripedFile
          org.apache.hadoop.hdfs.server.datanode.TestLargeBlockReport
          org.apache.hadoop.hdfs.server.namenode.TestBackupNode
          org.apache.hadoop.hdfs.server.namenode.TestFSImage
          org.apache.hadoop.hdfs.TestDFSClientRetries
          org.apache.hadoop.hdfs.server.namenode.TestQuotaByStorageType
          org.apache.hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot
          org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
          org.apache.hadoop.hdfs.TestFileAppend3
          org.apache.hadoop.hdfs.TestHDFSFileSystemContract
          org.apache.hadoop.hdfs.TestFileAppend2
          org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
          org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults
          org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream
          org.apache.hadoop.hdfs.server.namenode.TestFSImageWithXAttr
          org.apache.hadoop.hdfs.TestDFSPermission
          org.apache.hadoop.hdfs.TestDecommission
          org.apache.hadoop.hdfs.server.datanode.TestBatchIbr
          org.apache.hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand
          org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
          org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks
          org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot
          org.apache.hadoop.hdfs.security.TestDelegationToken
          org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks
          org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap
          org.apache.hadoop.hdfs.TestDFSStripedOutputStream
          org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal
          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer
          org.apache.hadoop.hdfs.server.diskbalancer.TestDiskBalancer
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
          org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped
          org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot
          org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
          org.apache.hadoop.hdfs.server.namenode.TestFsck
          org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled
          org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing
          org.apache.hadoop.hdfs.TestCrcCorruption
          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean
          org.apache.hadoop.hdfs.TestLeaseRecovery
          org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock
          org.apache.hadoop.hdfs.server.namenode.TestReconstructStripedBlocks
          org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
          org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots
          org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
          org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles
          org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
          org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode
          org.apache.hadoop.hdfs.server.mover.TestStorageMover
          org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks
          org.apache.hadoop.hdfs.server.balancer.TestBalancer
          org.apache.hadoop.hdfs.TestDatanodeReport
          org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles
          org.apache.hadoop.hdfs.TestAppendSnapshotTruncate
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean
          org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
          org.apache.hadoop.hdfs.server.namenode.TestNameNodeMXBean
          org.apache.hadoop.hdfs.server.namenode.TestAddStripedBlocks
          org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks
          org.apache.hadoop.hdfs.TestDataStream
          org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations
          org.apache.hadoop.hdfs.server.mover.TestMover
          org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation
          org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache
          org.apache.hadoop.hdfs.server.namenode.TestNestedEncryptionZones
          org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
          org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070
          org.apache.hadoop.hdfs.server.diskbalancer.TestDiskBalancerRPC
          org.apache.hadoop.hdfs.TestRenameWhileOpen
          org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache
          org.apache.hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwareness
          org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
          org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader
          org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs
          org.apache.hadoop.hdfs.TestFileConcurrentReader
          org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
          org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReconstruction
          org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner
          org.apache.hadoop.hdfs.server.namenode.TestFSImageWithAcl
          org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions
          org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement
          org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery
          org.apache.hadoop.hdfs.server.namenode.TestDeleteRace
          org.apache.hadoop.hdfs.TestDecommissionWithStriped
          org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
          org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby
          org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyInProgressTail
          org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA
          org.apache.hadoop.hdfs.server.namenode.TestEditLogRace
          org.apache.hadoop.hdfs.server.namenode.TestAddBlock
          org.apache.hadoop.hdfs.server.datanode.TestBlockScanner
          org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion
          org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
          org.apache.hadoop.hdfs.TestDistributedFileSystem
          org.apache.hadoop.fs.TestSWebHdfsFileContextMainOperations
          org.apache.hadoop.hdfs.TestHFlush
          org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
          org.apache.hadoop.fs.TestSymlinkHdfsFileSystem
          org.apache.hadoop.hdfs.server.namenode.TestAuditLogs
          org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
          org.apache.hadoop.hdfs.server.namenode.TestHostsFiles
          org.apache.hadoop.hdfs.TestGetBlocks
          org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode
          org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
          org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
          org.apache.hadoop.cli.TestHDFSCLI



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Issue HDFS-9500
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835385/HDFS-9500.002.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux f77703842fb8 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / f209e93
        Default Java 1.8.0_101
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17294/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17294/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 20s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 40s trunk passed +1 compile 0m 58s trunk passed +1 checkstyle 0m 35s trunk passed +1 mvnsite 1m 8s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 8s trunk passed +1 javadoc 0m 49s trunk passed +1 mvninstall 1m 0s the patch passed +1 compile 0m 56s the patch passed +1 javac 0m 56s the patch passed +1 checkstyle 0m 30s the patch passed +1 mvnsite 1m 5s the patch passed +1 mvneclipse 0m 12s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 9s the patch passed +1 javadoc 0m 42s the patch passed -1 unit 42m 26s hadoop-hdfs in the patch failed. +1 asflicense 0m 21s The patch does not generate ASF License warnings. 65m 37s Reason Tests Timed out junit tests org.apache.hadoop.hdfs.TestLeaseRecovery2   org.apache.hadoop.hdfs.TestFileChecksum   org.apache.hadoop.hdfs.TestParallelShortCircuitLegacyRead   org.apache.hadoop.hdfs.server.namenode.TestStartup   org.apache.hadoop.hdfs.TestWriteRead   org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure   org.apache.hadoop.fs.viewfs.TestViewFileSystemHdfs   org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager   org.apache.hadoop.hdfs.TestFileCreationClient   org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting   org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage   org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics   org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport   org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints   org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade   org.apache.hadoop.hdfs.TestMaintenanceState   org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy   org.apache.hadoop.hdfs.server.namenode.TestFileContextAcl   org.apache.hadoop.fs.TestEnhancedByteBufferAccess   org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete   org.apache.hadoop.hdfs.TestDFSClientFailover   org.apache.hadoop.hdfs.TestSetrepIncreasing   org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery   org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180   org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement   org.apache.hadoop.hdfs.server.namenode.TestNameNodeRecovery   org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160   org.apache.hadoop.hdfs.TestAclsEndToEnd   org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport   org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM   org.apache.hadoop.hdfs.TestFileCreation   org.apache.hadoop.hdfs.server.datanode.TestReadOnlySharedStorage   org.apache.hadoop.hdfs.TestReplication   org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot   org.apache.hadoop.hdfs.server.namenode.TestINodeFile   org.apache.hadoop.hdfs.server.namenode.TestEditLog   org.apache.hadoop.hdfs.server.namenode.TestNameNodeAcl   org.apache.hadoop.hdfs.server.datanode.TestDataNodeLifeline   org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter   org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer   org.apache.hadoop.hdfs.TestSafeModeWithStripedFile   org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130   org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl   org.apache.hadoop.hdfs.TestReconstructStripedFile   org.apache.hadoop.hdfs.TestWriteReadStripedFile   org.apache.hadoop.hdfs.server.datanode.TestLargeBlockReport   org.apache.hadoop.hdfs.server.namenode.TestBackupNode   org.apache.hadoop.hdfs.server.namenode.TestFSImage   org.apache.hadoop.hdfs.TestDFSClientRetries   org.apache.hadoop.hdfs.server.namenode.TestQuotaByStorageType   org.apache.hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot   org.apache.hadoop.hdfs.TestDFSStorageStateRecovery   org.apache.hadoop.hdfs.TestFileAppend3   org.apache.hadoop.hdfs.TestHDFSFileSystemContract   org.apache.hadoop.hdfs.TestFileAppend2   org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyIsHot   org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults   org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream   org.apache.hadoop.hdfs.server.namenode.TestFSImageWithXAttr   org.apache.hadoop.hdfs.TestDFSPermission   org.apache.hadoop.hdfs.TestDecommission   org.apache.hadoop.hdfs.server.datanode.TestBatchIbr   org.apache.hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand   org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives   org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks   org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot   org.apache.hadoop.hdfs.security.TestDelegationToken   org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks   org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap   org.apache.hadoop.hdfs.TestDFSStripedOutputStream   org.apache.hadoop.hdfs.client.impl.TestBlockReaderLocal   org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer   org.apache.hadoop.hdfs.server.diskbalancer.TestDiskBalancer   org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks   org.apache.hadoop.hdfs.server.blockmanagement.TestOverReplicatedBlocks   org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped   org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshot   org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots   org.apache.hadoop.hdfs.server.namenode.TestFsck   org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled   org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing   org.apache.hadoop.hdfs.TestCrcCorruption   org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer   org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean   org.apache.hadoop.hdfs.TestLeaseRecovery   org.apache.hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock   org.apache.hadoop.hdfs.server.namenode.TestReconstructStripedBlocks   org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits   org.apache.hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots   org.apache.hadoop.hdfs.server.namenode.TestFileTruncate   org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes   org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate   org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles   org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks   org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode   org.apache.hadoop.hdfs.server.mover.TestStorageMover   org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks   org.apache.hadoop.hdfs.server.balancer.TestBalancer   org.apache.hadoop.hdfs.TestDatanodeReport   org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding   org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles   org.apache.hadoop.hdfs.TestAppendSnapshotTruncate   org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean   org.apache.hadoop.hdfs.server.namenode.TestCheckpoint   org.apache.hadoop.hdfs.server.namenode.TestNameNodeMXBean   org.apache.hadoop.hdfs.server.namenode.TestAddStripedBlocks   org.apache.hadoop.hdfs.TestReadStripedFileWithMissingBlocks   org.apache.hadoop.hdfs.TestDataStream   org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs   org.apache.hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations   org.apache.hadoop.hdfs.server.mover.TestMover   org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090   org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration   org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation   org.apache.hadoop.hdfs.server.namenode.TestNamenodeRetryCache   org.apache.hadoop.hdfs.server.namenode.TestNestedEncryptionZones   org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070   org.apache.hadoop.hdfs.server.diskbalancer.TestDiskBalancerRPC   org.apache.hadoop.hdfs.TestRenameWhileOpen   org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache   org.apache.hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwareness   org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication   org.apache.hadoop.hdfs.server.namenode.TestFSEditLogLoader   org.apache.hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs   org.apache.hadoop.hdfs.TestFileConcurrentReader   org.apache.hadoop.hdfs.server.datanode.TestBPOfferService   org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReconstruction   org.apache.hadoop.hdfs.server.datanode.TestDirectoryScanner   org.apache.hadoop.hdfs.server.namenode.TestFSImageWithAcl   org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions   org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead   org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement   org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat   org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery   org.apache.hadoop.hdfs.server.namenode.TestDeleteRace   org.apache.hadoop.hdfs.TestDecommissionWithStriped   org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA   org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapStandby   org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyInProgressTail   org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA   org.apache.hadoop.hdfs.server.namenode.TestEditLogRace   org.apache.hadoop.hdfs.server.namenode.TestAddBlock   org.apache.hadoop.hdfs.server.datanode.TestBlockScanner   org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion   org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   org.apache.hadoop.hdfs.TestDistributedFileSystem   org.apache.hadoop.fs.TestSWebHdfsFileContextMainOperations   org.apache.hadoop.hdfs.TestHFlush   org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover   org.apache.hadoop.fs.TestSymlinkHdfsFileSystem   org.apache.hadoop.hdfs.server.namenode.TestAuditLogs   org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS   org.apache.hadoop.hdfs.server.namenode.TestHostsFiles   org.apache.hadoop.hdfs.TestGetBlocks   org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode   org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup   org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics   org.apache.hadoop.cli.TestHDFSCLI Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-9500 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835385/HDFS-9500.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux f77703842fb8 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / f209e93 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17294/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17294/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17294/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        shv Konstantin Shvachko added a comment -

        +1. Looks good.
        Will commit in a bit.

        Show
        shv Konstantin Shvachko added a comment - +1. Looks good. Will commit in a bit.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 21s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 8m 32s trunk passed
        +1 compile 1m 2s trunk passed
        +1 checkstyle 0m 31s trunk passed
        +1 mvnsite 1m 8s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 2m 4s trunk passed
        +1 javadoc 0m 45s trunk passed
        +1 mvninstall 1m 1s the patch passed
        +1 compile 1m 0s the patch passed
        +1 javac 1m 0s the patch passed
        +1 checkstyle 0m 26s the patch passed
        +1 mvnsite 0m 51s the patch passed
        +1 mvneclipse 0m 10s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 52s the patch passed
        +1 javadoc 0m 40s the patch passed
        -1 unit 61m 56s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 17s The patch does not generate ASF License warnings.
        84m 6s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
        Timed out junit tests org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete
          org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
          org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage
          org.apache.hadoop.hdfs.server.namenode.TestNameNodeReconfigure
          org.apache.hadoop.hdfs.TestLeaseRecovery
          org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
          org.apache.hadoop.hdfs.TestRestartDFS
          org.apache.hadoop.hdfs.server.namenode.TestDeleteRace
          org.apache.hadoop.hdfs.TestLeaseRecoveryStriped
          org.apache.hadoop.hdfs.server.namenode.TestAuditLogs



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Issue HDFS-9500
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835378/HDFS-9500.001.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux bbf763f21909 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 1f8490a
        Default Java 1.8.0_101
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17293/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17293/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17293/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 21s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 32s trunk passed +1 compile 1m 2s trunk passed +1 checkstyle 0m 31s trunk passed +1 mvnsite 1m 8s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 4s trunk passed +1 javadoc 0m 45s trunk passed +1 mvninstall 1m 1s the patch passed +1 compile 1m 0s the patch passed +1 javac 1m 0s the patch passed +1 checkstyle 0m 26s the patch passed +1 mvnsite 0m 51s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 52s the patch passed +1 javadoc 0m 40s the patch passed -1 unit 61m 56s hadoop-hdfs in the patch failed. +1 asflicense 0m 17s The patch does not generate ASF License warnings. 84m 6s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency Timed out junit tests org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete   org.apache.hadoop.hdfs.TestDFSUpgradeFromImage   org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage   org.apache.hadoop.hdfs.server.namenode.TestNameNodeReconfigure   org.apache.hadoop.hdfs.TestLeaseRecovery   org.apache.hadoop.hdfs.server.namenode.TestFileTruncate   org.apache.hadoop.hdfs.TestRestartDFS   org.apache.hadoop.hdfs.server.namenode.TestDeleteRace   org.apache.hadoop.hdfs.TestLeaseRecoveryStriped   org.apache.hadoop.hdfs.server.namenode.TestAuditLogs Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-9500 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835378/HDFS-9500.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux bbf763f21909 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 1f8490a Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17293/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17293/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17293/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xkrogen Erik Krogen added a comment -

        Good catch, Ravi Prakash! Attaching v002 patch. Thanks!

        Show
        xkrogen Erik Krogen added a comment - Good catch, Ravi Prakash ! Attaching v002 patch. Thanks!
        Hide
        raviprak Ravi Prakash added a comment -

        Thanks for your patch Erik and your review Konst!

        // Check isAlive too because right after removeDatanode(),

        // isDatanodeDead() is still true

        This comment is not valid anymore after your changes. Could you please fix this too?

        Show
        raviprak Ravi Prakash added a comment - Thanks for your patch Erik and your review Konst! // Check isAlive too because right after removeDatanode(), // isDatanodeDead() is still true This comment is not valid anymore after your changes. Could you please fix this too?
        Hide
        xkrogen Erik Krogen added a comment - - edited

        Konstantin Shvachko, thanks for the review. I have added Javadocs. For DatanodeRegistration I had copied the style of testNumVersionsReportedCorrect below but you're right that it can be done with just the constructor. Attaching v001 patch with these changes.

        Show
        xkrogen Erik Krogen added a comment - - edited Konstantin Shvachko , thanks for the review. I have added Javadocs. For DatanodeRegistration I had copied the style of testNumVersionsReportedCorrect below but you're right that it can be done with just the constructor. Attaching v001 patch with these changes.
        Hide
        shv Konstantin Shvachko added a comment -

        Good find Erik. I agree we should decrement version count whenever DN is alive. The heartbeat interval expiration doesn't matter here because the node will be marked alive by that same method. It should also work for the full version recount in countSoftwareVersions(). Two nits

        1. Could you please add Javadoc
          • to shouldCountVersion() saying we count versions for all alive nodes
          • and to the new test explaining its purpose
        2. You do not need to use Mockito for setting fields in DatanodeRegistration. Can't you just use a constructor?
        Show
        shv Konstantin Shvachko added a comment - Good find Erik. I agree we should decrement version count whenever DN is alive. The heartbeat interval expiration doesn't matter here because the node will be marked alive by that same method. It should also work for the full version recount in countSoftwareVersions() . Two nits Could you please add Javadoc to shouldCountVersion() saying we count versions for all alive nodes and to the new test explaining its purpose You do not need to use Mockito for setting fields in DatanodeRegistration . Can't you just use a constructor?
        Hide
        xkrogen Erik Krogen added a comment -

        The TestDiskspaceQuotaUpdate failure is unrelated and documented in HDFS-10921. The other 5 tests all pass locally.

        Show
        xkrogen Erik Krogen added a comment - The TestDiskspaceQuotaUpdate failure is unrelated and documented in HDFS-10921 . The other 5 tests all pass locally.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 17s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 8m 32s trunk passed
        +1 compile 0m 57s trunk passed
        +1 checkstyle 0m 32s trunk passed
        +1 mvnsite 1m 3s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 1m 52s trunk passed
        +1 javadoc 0m 41s trunk passed
        +1 mvninstall 0m 45s the patch passed
        +1 compile 0m 41s the patch passed
        +1 javac 0m 41s the patch passed
        +1 checkstyle 0m 23s the patch passed
        +1 mvnsite 0m 48s the patch passed
        +1 mvneclipse 0m 9s the patch passed
        +1 whitespace 0m 0s The patch has no whitespace issues.
        +1 findbugs 1m 48s the patch passed
        +1 javadoc 0m 36s the patch passed
        -1 unit 43m 7s hadoop-hdfs in the patch failed.
        +1 asflicense 0m 20s The patch does not generate ASF License warnings.
        63m 57s



        Reason Tests
        Failed junit tests hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate
        Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter
          org.apache.hadoop.tracing.TestTracing
          org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation
          org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:9560f25
        JIRA Issue HDFS-9500
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835171/HDFS-9500.000.patch
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux a1d58e835aba 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 9a8a386
        Default Java 1.8.0_101
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/17278/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
        Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17278/testReport/
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17278/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 17s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 32s trunk passed +1 compile 0m 57s trunk passed +1 checkstyle 0m 32s trunk passed +1 mvnsite 1m 3s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 1m 52s trunk passed +1 javadoc 0m 41s trunk passed +1 mvninstall 0m 45s the patch passed +1 compile 0m 41s the patch passed +1 javac 0m 41s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 48s the patch passed +1 mvneclipse 0m 9s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 48s the patch passed +1 javadoc 0m 36s the patch passed -1 unit 43m 7s hadoop-hdfs in the patch failed. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 63m 57s Reason Tests Failed junit tests hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate Timed out junit tests org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade   org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter   org.apache.hadoop.tracing.TestTracing   org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation   org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-9500 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12835171/HDFS-9500.000.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux a1d58e835aba 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 9a8a386 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17278/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17278/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17278/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        xkrogen Erik Krogen added a comment - - edited

        Attaching a patch with a unit test to reproduce the issue deterministically on trunk. It also contains a fix for the problem by changing the condition for shouldCountVersion as described in my previous comment. As far as I can tell the node.isAlive() condition check is sufficient but I would appreciate review to confirm. Konstantin Shvachko?

        Show
        xkrogen Erik Krogen added a comment - - edited Attaching a patch with a unit test to reproduce the issue deterministically on trunk. It also contains a fix for the problem by changing the condition for shouldCountVersion as described in my previous comment. As far as I can tell the node.isAlive() condition check is sufficient but I would appreciate review to confirm. Konstantin Shvachko ?
        Hide
        xkrogen Erik Krogen added a comment -

        Assigning this to myself since Phil does not seem to be actively working on it anymore.

        I can (intermittently) reproduce this test failure on branch-2.7 if I increase the number of iterations on TestDatanodeManager.testNumVersionsReportedCorrect to 5000. I found that for the node whose version should have been decremented, shouldDecrementVersion() returned false because isDatanodeDead() was true (but isAlive was also true).

        It seems this situation could arise if the time since the last heartbeat from the node was above the threshold to determine it is as dead, but the HeartbeatManager had not yet done so. I am open to suggestions about this. Would just checking DatanodeDescriptor.isAlive be sufficient here instead of the check on both isAlive and isDatanodeDead()?

        Show
        xkrogen Erik Krogen added a comment - Assigning this to myself since Phil does not seem to be actively working on it anymore. I can (intermittently) reproduce this test failure on branch-2.7 if I increase the number of iterations on TestDatanodeManager.testNumVersionsReportedCorrect to 5000. I found that for the node whose version should have been decremented, shouldDecrementVersion() returned false because isDatanodeDead() was true (but isAlive was also true). It seems this situation could arise if the time since the last heartbeat from the node was above the threshold to determine it is as dead, but the HeartbeatManager had not yet done so. I am open to suggestions about this. Would just checking DatanodeDescriptor.isAlive be sufficient here instead of the check on both isAlive and isDatanodeDead() ?
        Hide
        shv Konstantin Shvachko added a comment -

        Sounds like the condition for decrementing the old version is not accurate. I see this on trunk and other versions.
        Phil Yang do you still plan to work on it?

        Show
        shv Konstantin Shvachko added a comment - Sounds like the condition for decrementing the old version is not accurate. I see this on trunk and other versions. Phil Yang do you still plan to work on it?
        Hide
        sjlee0 Sangjin Lee added a comment -

        Moving this issue to 2.6.6. Please move back if you feel otherwise.

        Show
        sjlee0 Sangjin Lee added a comment - Moving this issue to 2.6.6. Please move back if you feel otherwise.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 patch 0m 5s HDFS-9500 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775497/9500-v1.patch
        JIRA Issue HDFS-9500
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16465/console
        Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 5s HDFS-9500 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775497/9500-v1.patch JIRA Issue HDFS-9500 Console output https://builds.apache.org/job/PreCommit-HDFS-Build/16465/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        vinodkv Vinod Kumar Vavilapalli added a comment -

        2.7.3 is under release process, changing target-version to 2.7.4.

        Show
        vinodkv Vinod Kumar Vavilapalli added a comment - 2.7.3 is under release process, changing target-version to 2.7.4.
        Hide
        raviprak Ravi Prakash added a comment -

        I see HDFS-9371 did away with the finer grained locking we had implemented in incrementVersionCount and decrementVersionCount (earlier we were synchronizing on datanodeMap, and now we synchronize on the entire DatanodeManager).

        I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in old version is not alive because of shutting down, it will not pass shouldCountVersion, so the number of old version won't be decreased. But this method only judges the status of heartbeat and isAlive at that moment, if namenode has not removed this node which will decrease the version map and this node restarts in the new version, the decrementVersionCount belongs to this node will never be executed.

        Thanks for the analysis Phil Yang! Could you please help me understand it? Which version of Hadoop did you experience this on? How do you update the version of the DNs? Do you let a long time pass between bringing down the DN in the old version and then bringing back a DN with the new version?
        What state is the Datanode in when its old version is not decremented?

        Wouldn't https://github.com/apache/hadoop/blob/branch-2.6.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L520 decrement the version count?

        Kihwal Lee Are you seeing this too?

        Show
        raviprak Ravi Prakash added a comment - I see HDFS-9371 did away with the finer grained locking we had implemented in incrementVersionCount and decrementVersionCount (earlier we were synchronizing on datanodeMap, and now we synchronize on the entire DatanodeManager). I think this issue is caused by DatanodeManager.registerDatanode. If nodeS in old version is not alive because of shutting down, it will not pass shouldCountVersion, so the number of old version won't be decreased. But this method only judges the status of heartbeat and isAlive at that moment, if namenode has not removed this node which will decrease the version map and this node restarts in the new version, the decrementVersionCount belongs to this node will never be executed. Thanks for the analysis Phil Yang ! Could you please help me understand it? Which version of Hadoop did you experience this on? How do you update the version of the DNs? Do you let a long time pass between bringing down the DN in the old version and then bringing back a DN with the new version? What state is the Datanode in when its old version is not decremented? Wouldn't https://github.com/apache/hadoop/blob/branch-2.6.2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java#L520 decrement the version count? Kihwal Lee Are you seeing this too?
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        -1 patch 0m 5s HDFS-9500 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help.



        Subsystem Report/Notes
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775497/9500-v1.patch
        JIRA Issue HDFS-9500
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14633/console
        Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. -1 patch 0m 5s HDFS-9500 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. Subsystem Report/Notes JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775497/9500-v1.patch JIRA Issue HDFS-9500 Console output https://builds.apache.org/job/PreCommit-HDFS-Build/14633/console Powered by Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
        Hide
        raviprak Ravi Prakash added a comment -

        Thanks for reporting the issue Phil and for bringing it to my attention Kihwal! Sorry for the late reply (my inbox ate mail)

        We wanted to avoid counting the version map for every registration because as Kihwal points out, it could be an expensive operation for big clusters (and you have to do it for every registration / dead node.) I'd be in favor of fixing the problem without having to count all the versions every time, although if its too onerous, I'm fine with Phil's proposal as well. I'll look through the code to see if there's an easy fix

        Show
        raviprak Ravi Prakash added a comment - Thanks for reporting the issue Phil and for bringing it to my attention Kihwal! Sorry for the late reply (my inbox ate mail) We wanted to avoid counting the version map for every registration because as Kihwal points out, it could be an expensive operation for big clusters (and you have to do it for every registration / dead node.) I'd be in favor of fixing the problem without having to count all the versions every time, although if its too onerous, I'm fine with Phil's proposal as well. I'll look through the code to see if there's an easy fix
        Hide
        djp Junping Du added a comment -

        Move all non-critical pending issues out of 2.6.4 into 2.6.5.

        Show
        djp Junping Du added a comment - Move all non-critical pending issues out of 2.6.4 into 2.6.5.
        Hide
        kihwal Kihwal Lee added a comment -

        No doubt this op will be hot during start-up of a several thousand node cluster.
        What do you think Ravi Prakash?

        Show
        kihwal Kihwal Lee added a comment - No doubt this op will be hot during start-up of a several thousand node cluster. What do you think Ravi Prakash ?
        Hide
        djp Junping Du added a comment -

        So the simplest way to fix this is that we always recounting the version map in registerDatanode since it is not a heavy operation.

        Kihwal Lee, Do we have any concern for this solution when scale of the cluster is really huge?

        Show
        djp Junping Du added a comment - So the simplest way to fix this is that we always recounting the version map in registerDatanode since it is not a heavy operation. Kihwal Lee , Do we have any concern for this solution when scale of the cluster is really huge?
        Hide
        kihwal Kihwal Lee added a comment -

        2.7.2 is waiting for a couple of blockers. Targeting 2.7.3.

        Show
        kihwal Kihwal Lee added a comment - 2.7.2 is waiting for a couple of blockers. Targeting 2.7.3.
        Hide
        djp Junping Du added a comment -

        Moving non-blocker/non-critical issues out of 2.6.3 into 2.6.4.

        Show
        djp Junping Du added a comment - Moving non-blocker/non-critical issues out of 2.6.3 into 2.6.4.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 0s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 mvninstall 8m 22s trunk passed
        +1 compile 0m 44s trunk passed with JDK v1.8.0_66
        +1 compile 0m 45s trunk passed with JDK v1.7.0_85
        +1 checkstyle 0m 17s trunk passed
        +1 mvnsite 0m 57s trunk passed
        +1 mvneclipse 0m 14s trunk passed
        +1 findbugs 2m 5s trunk passed
        +1 javadoc 1m 9s trunk passed with JDK v1.8.0_66
        +1 javadoc 1m 52s trunk passed with JDK v1.7.0_85
        +1 mvninstall 0m 52s the patch passed
        +1 compile 0m 45s the patch passed with JDK v1.8.0_66
        +1 javac 0m 45s the patch passed
        +1 compile 0m 45s the patch passed with JDK v1.7.0_85
        +1 javac 0m 45s the patch passed
        +1 checkstyle 0m 17s the patch passed
        +1 mvnsite 0m 57s the patch passed
        +1 mvneclipse 0m 14s the patch passed
        +1 whitespace 0m 0s Patch has no whitespace issues.
        +1 findbugs 2m 12s the patch passed
        +1 javadoc 1m 12s the patch passed with JDK v1.8.0_66
        +1 javadoc 1m 51s the patch passed with JDK v1.7.0_85
        -1 unit 56m 26s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
        +1 unit 53m 30s hadoop-hdfs in the patch passed with JDK v1.7.0_85.
        -1 asflicense 0m 23s Patch generated 59 ASF License warnings.
        138m 44s



        Reason Tests
        JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestDatanodeRegistration
          hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
          hadoop.hdfs.TestFileAppend



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:0ca8df7
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775497/9500-v1.patch
        JIRA Issue HDFS-9500
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux c6f8a79943f4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 3857fed
        findbugs v3.0.0
        unit https://builds.apache.org/job/PreCommit-HDFS-Build/13748/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
        unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13748/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
        JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13748/testReport/
        asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13748/artifact/patchprocess/patch-asflicense-problems.txt
        modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
        Max memory used 76MB
        Powered by Apache Yetus http://yetus.apache.org
        Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13748/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 22s trunk passed +1 compile 0m 44s trunk passed with JDK v1.8.0_66 +1 compile 0m 45s trunk passed with JDK v1.7.0_85 +1 checkstyle 0m 17s trunk passed +1 mvnsite 0m 57s trunk passed +1 mvneclipse 0m 14s trunk passed +1 findbugs 2m 5s trunk passed +1 javadoc 1m 9s trunk passed with JDK v1.8.0_66 +1 javadoc 1m 52s trunk passed with JDK v1.7.0_85 +1 mvninstall 0m 52s the patch passed +1 compile 0m 45s the patch passed with JDK v1.8.0_66 +1 javac 0m 45s the patch passed +1 compile 0m 45s the patch passed with JDK v1.7.0_85 +1 javac 0m 45s the patch passed +1 checkstyle 0m 17s the patch passed +1 mvnsite 0m 57s the patch passed +1 mvneclipse 0m 14s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 2m 12s the patch passed +1 javadoc 1m 12s the patch passed with JDK v1.8.0_66 +1 javadoc 1m 51s the patch passed with JDK v1.7.0_85 -1 unit 56m 26s hadoop-hdfs in the patch failed with JDK v1.8.0_66. +1 unit 53m 30s hadoop-hdfs in the patch passed with JDK v1.7.0_85. -1 asflicense 0m 23s Patch generated 59 ASF License warnings. 138m 44s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestDatanodeRegistration   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes   hadoop.hdfs.TestFileAppend Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12775497/9500-v1.patch JIRA Issue HDFS-9500 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux c6f8a79943f4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 3857fed findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/13748/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13748/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt JDK v1.7.0_85 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13748/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13748/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 76MB Powered by Apache Yetus http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13748/console This message was automatically generated.

          People

          • Assignee:
            xkrogen Erik Krogen
            Reporter:
            yangzhe1991 Phil Yang
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development