Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9516

truncate file fails with data dirs on multiple disks

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.1
    • Fix Version/s: 2.8.0, 2.7.3, 3.0.0-alpha1
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      FileSystem.truncate returns false (no exception) but the file is never closed and not writable after this.

      It seems to be because of copy on truncate which is used because the system is in upgrade state. In this case a rename between devices is attempted.
      See attached log and repro code.
      Probably also affects truncate snapshotted file when copy on truncate is also used.
      Possibly it affects not only truncate but any block recovery.

      I think the problem is in updateReplicaUnderRecovery

      ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten(
                  newBlockId, recoveryId, rur.getVolume(), blockFile.getParentFile(),
                  newlength);
      

      blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to choose any volume so rur.getVolume() is not where the block is located.

      1. HDFS-9516_3.patch
        3 kB
        Plamen Jeliazkov
      2. HDFS-9516_2.patch
        2 kB
        Plamen Jeliazkov
      3. HDFS-9516_1.patch
        2 kB
        Plamen Jeliazkov
      4. HDFS-9516_testFailures.patch
        1 kB
        Plamen Jeliazkov
      5. truncate.dn.log
        3 kB
        Bogdan Raducanu
      6. Main.java
        2 kB
        Bogdan Raducanu

        Issue Links

          Activity

          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Closing the JIRA as part of 2.7.3 release.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Closing the JIRA as part of 2.7.3 release.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8983 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8983/)
          Update CHANGES.txt to move HDFS-9516 to 2.7.3 section. (shv: rev d90625e03871639769be032060c9c6173f919fe8)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8983 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8983/ ) Update CHANGES.txt to move HDFS-9516 to 2.7.3 section. (shv: rev d90625e03871639769be032060c9c6173f919fe8) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          shv Konstantin Shvachko added a comment -

          Committed to branch-2.8 and branch-2.7.

          Show
          shv Konstantin Shvachko added a comment - Committed to branch-2.8 and branch-2.7.
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Konstantin Shvachko, unfortunately this came in too late for 2.7.2. That said, I don’t see any reason why this shouldn’t be in 2.8.0 and 2.7.3. Setting the target-versions accordingly on JIRA.

          If you agree, appreciate backport help to those branches (branch-2.8.0, branch-2.7).

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Konstantin Shvachko , unfortunately this came in too late for 2.7.2. That said, I don’t see any reason why this shouldn’t be in 2.8.0 and 2.7.3. Setting the target-versions accordingly on JIRA. If you agree, appreciate backport help to those branches (branch-2.8.0, branch-2.7).
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #694 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/694/)
          HDFS-9516. Truncate file fails with data dirs on multiple disks. (shv: rev 96d307e1e320eafb470faf7bd47af3341c399d55)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #694 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/694/ ) HDFS-9516 . Truncate file fails with data dirs on multiple disks. (shv: rev 96d307e1e320eafb470faf7bd47af3341c399d55) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #8968 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8968/)
          HDFS-9516. Truncate file fails with data dirs on multiple disks. (shv: rev 96d307e1e320eafb470faf7bd47af3341c399d55)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #8968 (See https://builds.apache.org/job/Hadoop-trunk-Commit/8968/ ) HDFS-9516 . Truncate file fails with data dirs on multiple disks. (shv: rev 96d307e1e320eafb470faf7bd47af3341c399d55) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          shv Konstantin Shvachko added a comment -

          I just committed this to trunk and branch-2. Thank you Plamen.

          Show
          shv Konstantin Shvachko added a comment - I just committed this to trunk and branch-2. Thank you Plamen.
          Hide
          shv Konstantin Shvachko added a comment -

          +1 on the latest patch.
          The Jenkins report is quite confusing. Don't know if there is any value in running it. Still

          • No new tests, because existing tests should fail due to the new assert statement.
          • checkstyle issues seems to be the same 123
          • Failed tests are reported incorrectly in the jira. Ran 8 failed tests locally, no problems.
          • No new files in the patch, so ASF warnings are not related.

          Will commit shortly.
          Also should we target it for any of the upcoming releases? Seems like a critical bug.

          Show
          shv Konstantin Shvachko added a comment - +1 on the latest patch. The Jenkins report is quite confusing. Don't know if there is any value in running it. Still No new tests, because existing tests should fail due to the new assert statement. checkstyle issues seems to be the same 123 Failed tests are reported incorrectly in the jira. Ran 8 failed tests locally, no problems. No new files in the patch, so ASF warnings are not related. Will commit shortly. Also should we target it for any of the upcoming releases? Seems like a critical bug.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 0s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 19m 48s trunk passed
          +1 compile 2m 49s trunk passed with JDK v1.8.0_66
          +1 compile 1m 57s trunk passed with JDK v1.7.0_91
          +1 checkstyle 0m 38s trunk passed
          +1 mvnsite 2m 25s trunk passed
          +1 mvneclipse 0m 31s trunk passed
          +1 findbugs 4m 49s trunk passed
          +1 javadoc 3m 3s trunk passed with JDK v1.8.0_66
          +1 javadoc 4m 32s trunk passed with JDK v1.7.0_91
          -1 mvninstall 2m 16s hadoop-hdfs in the patch failed.
          +1 compile 2m 38s the patch passed with JDK v1.8.0_66
          +1 javac 2m 38s the patch passed
          +1 compile 1m 52s the patch passed with JDK v1.7.0_91
          +1 javac 1m 52s the patch passed
          -1 checkstyle 0m 40s Patch generated 1 new checkstyle issues in hadoop-hdfs-project/hadoop-hdfs (total was 123, now 123).
          +1 mvnsite 2m 20s the patch passed
          +1 mvneclipse 0m 32s the patch passed
          +1 whitespace 0m 0s Patch has no whitespace issues.
          +1 findbugs 4m 58s the patch passed
          +1 javadoc 2m 48s the patch passed with JDK v1.8.0_66
          +1 javadoc 4m 29s the patch passed with JDK v1.7.0_91
          -1 unit 181m 43s hadoop-hdfs in the patch failed with JDK v1.8.0_66.
          -1 unit 162m 3s hadoop-hdfs in the patch failed with JDK v1.7.0_91.
          -1 asflicense 0m 51s Patch generated 56 ASF License warnings.
          415m 40s



          Reason Tests
          JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
            hadoop.hdfs.TestDFSUpgrade
            hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.TestPersistBlocks
            hadoop.hdfs.TestDataTransferKeepalive
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
            hadoop.hdfs.server.namenode.TestFsck
            hadoop.hdfs.TestLocalDFS
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
            hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
            hadoop.hdfs.tools.TestDFSAdminWithHA
            hadoop.hdfs.server.namenode.TestMetaSave
            hadoop.hdfs.server.namenode.TestSecurityTokenEditLog
            hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation
            hadoop.hdfs.server.datanode.TestBlockReplacement
            hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes
            hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080
            hadoop.hdfs.server.namenode.ha.TestHAAppend
            hadoop.fs.TestSymlinkHdfsFileContext
            hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA
            hadoop.hdfs.server.namenode.TestDecommissioningStatus
            hadoop.hdfs.qjournal.TestSecureNNWithQJM
            hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
            hadoop.hdfs.TestEncryptionZones
            hadoop.hdfs.server.blockmanagement.TestReplicationPolicy
            hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork
            hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot
            hadoop.hdfs.server.datanode.TestDirectoryScanner
          JDK v1.7.0_91 Failed junit tests hadoop.hdfs.web.TestWebHdfsTimeouts
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency
            hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
            hadoop.hdfs.server.namenode.ha.TestEditLogTailer
            hadoop.hdfs.TestPersistBlocks
            hadoop.hdfs.server.namenode.TestSecureNameNode
            hadoop.hdfs.shortcircuit.TestShortCircuitCache
            hadoop.hdfs.TestRecoverStripedFile
            hadoop.hdfs.TestDataTransferKeepalive
            hadoop.hdfs.security.TestDelegationTokenForProxyUser
            hadoop.hdfs.TestLocalDFS
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160
            hadoop.hdfs.server.namenode.TestNameNodeMXBean
            hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure
            hadoop.hdfs.TestDFSStripedOutputStreamWithFailure020
            hadoop.hdfs.TestSafeMode
            hadoop.hdfs.server.datanode.TestBlockReplacement
            hadoop.hdfs.server.namenode.TestFileLimit
            hadoop.hdfs.server.namenode.ha.TestHAAppend
            hadoop.hdfs.server.namenode.TestDecommissioningStatus
            hadoop.hdfs.qjournal.TestSecureNNWithQJM
            hadoop.hdfs.server.namenode.TestFileTruncate
            hadoop.hdfs.TestEncryptionZones
            hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer
            hadoop.hdfs.server.datanode.TestDirectoryScanner
            hadoop.hdfs.server.datanode.fsdataset.impl.TestDatanodeRestart



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:0ca8df7
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12777607/HDFS-9516_3.patch
          JIRA Issue HDFS-9516
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux bc508ec6ad75 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 915cd6c
          findbugs v3.0.0
          mvninstall https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt
          JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13872/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Max memory used 76MB
          Powered by Apache Yetus 0.1.0 http://yetus.apache.org
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13872/console

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 0s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 19m 48s trunk passed +1 compile 2m 49s trunk passed with JDK v1.8.0_66 +1 compile 1m 57s trunk passed with JDK v1.7.0_91 +1 checkstyle 0m 38s trunk passed +1 mvnsite 2m 25s trunk passed +1 mvneclipse 0m 31s trunk passed +1 findbugs 4m 49s trunk passed +1 javadoc 3m 3s trunk passed with JDK v1.8.0_66 +1 javadoc 4m 32s trunk passed with JDK v1.7.0_91 -1 mvninstall 2m 16s hadoop-hdfs in the patch failed. +1 compile 2m 38s the patch passed with JDK v1.8.0_66 +1 javac 2m 38s the patch passed +1 compile 1m 52s the patch passed with JDK v1.7.0_91 +1 javac 1m 52s the patch passed -1 checkstyle 0m 40s Patch generated 1 new checkstyle issues in hadoop-hdfs-project/hadoop-hdfs (total was 123, now 123). +1 mvnsite 2m 20s the patch passed +1 mvneclipse 0m 32s the patch passed +1 whitespace 0m 0s Patch has no whitespace issues. +1 findbugs 4m 58s the patch passed +1 javadoc 2m 48s the patch passed with JDK v1.8.0_66 +1 javadoc 4m 29s the patch passed with JDK v1.7.0_91 -1 unit 181m 43s hadoop-hdfs in the patch failed with JDK v1.8.0_66. -1 unit 162m 3s hadoop-hdfs in the patch failed with JDK v1.7.0_91. -1 asflicense 0m 51s Patch generated 56 ASF License warnings. 415m 40s Reason Tests JDK v1.8.0_66 Failed junit tests hadoop.hdfs.TestDFSUpgradeFromImage   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover   hadoop.hdfs.TestDFSUpgrade   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.TestPersistBlocks   hadoop.hdfs.TestDataTransferKeepalive   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.server.namenode.TestFsck   hadoop.hdfs.TestLocalDFS   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure090   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes   hadoop.hdfs.tools.TestDFSAdminWithHA   hadoop.hdfs.server.namenode.TestMetaSave   hadoop.hdfs.server.namenode.TestSecurityTokenEditLog   hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes   hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.fs.TestSymlinkHdfsFileContext   hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA   hadoop.hdfs.server.namenode.TestDecommissioningStatus   hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.server.blockmanagement.TestReplicationPolicy   hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork   hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot   hadoop.hdfs.server.datanode.TestDirectoryScanner JDK v1.7.0_91 Failed junit tests hadoop.hdfs.web.TestWebHdfsTimeouts   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover   hadoop.hdfs.server.namenode.ha.TestEditLogTailer   hadoop.hdfs.TestPersistBlocks   hadoop.hdfs.server.namenode.TestSecureNameNode   hadoop.hdfs.shortcircuit.TestShortCircuitCache   hadoop.hdfs.TestRecoverStripedFile   hadoop.hdfs.TestDataTransferKeepalive   hadoop.hdfs.security.TestDelegationTokenForProxyUser   hadoop.hdfs.TestLocalDFS   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160   hadoop.hdfs.server.namenode.TestNameNodeMXBean   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure020   hadoop.hdfs.TestSafeMode   hadoop.hdfs.server.datanode.TestBlockReplacement   hadoop.hdfs.server.namenode.TestFileLimit   hadoop.hdfs.server.namenode.ha.TestHAAppend   hadoop.hdfs.server.namenode.TestDecommissioningStatus   hadoop.hdfs.qjournal.TestSecureNNWithQJM   hadoop.hdfs.server.namenode.TestFileTruncate   hadoop.hdfs.TestEncryptionZones   hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer   hadoop.hdfs.server.datanode.TestDirectoryScanner   hadoop.hdfs.server.datanode.fsdataset.impl.TestDatanodeRestart Subsystem Report/Notes Docker Image:yetus/hadoop:0ca8df7 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12777607/HDFS-9516_3.patch JIRA Issue HDFS-9516 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux bc508ec6ad75 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 915cd6c findbugs v3.0.0 mvninstall https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt unit test logs https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_91.txt JDK v1.7.0_91 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/13872/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/13872/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Max memory used 76MB Powered by Apache Yetus 0.1.0 http://yetus.apache.org Console output https://builds.apache.org/job/PreCommit-HDFS-Build/13872/console This message was automatically generated.
          Hide
          zero45 Plamen Jeliazkov added a comment -

          If you ignore the 2nd code block change in my "_3" patch and only add the assert change, you will notice the same set of test failures as in my previous comments; all alluding to the issue that Bogdan Raducanu saw.

          Show
          zero45 Plamen Jeliazkov added a comment - If you ignore the 2nd code block change in my "_3" patch and only add the assert change, you will notice the same set of test failures as in my previous comments; all alluding to the issue that Bogdan Raducanu saw.
          Hide
          zero45 Plamen Jeliazkov added a comment -

          Attaching "_3" patch which contains an assert that Konstantin Shvachko wished for.

          The assert this time checks that the volume base path is the starting path of the block file in order to assure that the block file will not attempt to be moved across volumes when finalized.

          Show
          zero45 Plamen Jeliazkov added a comment - Attaching "_3" patch which contains an assert that Konstantin Shvachko wished for. The assert this time checks that the volume base path is the starting path of the block file in order to assure that the block file will not attempt to be moved across volumes when finalized.
          Hide
          shv Konstantin Shvachko added a comment -

          Plamen, the fix looks good.

          • I think you should restate the assert. Not in copyReplicaWithNewBlockIdAndGS() though, where it would look trivial. Placing it somewhere where you have old and new ReplicaInfo would protect from future errors like this. One option is to assert just before calling finalizeReplica(), or you can find a better place.
          • Removing the try block seems to me allright. In current version a new volume can be picked up with a need to increment the reference for it. With your patch it is going to be the same volume as rur, which should have a proper reference count already.
          Show
          shv Konstantin Shvachko added a comment - Plamen, the fix looks good. I think you should restate the assert. Not in copyReplicaWithNewBlockIdAndGS() though, where it would look trivial. Placing it somewhere where you have old and new ReplicaInfo would protect from future errors like this. One option is to assert just before calling finalizeReplica() , or you can find a better place. Removing the try block seems to me allright. In current version a new volume can be picked up with a need to increment the reference for it. With your patch it is going to be the same volume as rur , which should have a proper reference count already.
          Hide
          zero45 Plamen Jeliazkov added a comment -

          Attaching second patch (_2) with same fix but the assert statement taken out.

          Please note I have also removed the try block; please let me know if I should have left that back in. It does not seem it was needed anymore when I took a look though.

          Show
          zero45 Plamen Jeliazkov added a comment - Attaching second patch (_2) with same fix but the assert statement taken out. Please note I have also removed the try block; please let me know if I should have left that back in. It does not seem it was needed anymore when I took a look though.
          Hide
          zero45 Plamen Jeliazkov added a comment -

          Attaching first patch (_1) with proposed fix. I've left the assert in place. All unit tests pass locally.

          Show
          zero45 Plamen Jeliazkov added a comment - Attaching first patch (_1) with proposed fix. I've left the assert in place. All unit tests pass locally.
          Hide
          zero45 Plamen Jeliazkov added a comment -

          Attaching patch which highlights Bogdan Raducanu's issue. If you run the TestFileTruncate tests with 'HDFS-9516_testFailures.patch' applied you will see 6 failures:

          1. testUpgradeAndRestart
          2. testSnapshotWithAppendTruncate
          3. testCopyOnTruncateWithDataNodesRestart
          4. testSnapshotWithTruncates
          5. testTruncateRecovery
          6. testSnapshotTruncateThenDeleteSnapshot

          Basic proposal for fix is to use the same FsVolume as which the 'replicaInfo under recovery' is on rather than try to find a new Volume of the same disk type.

          Show
          zero45 Plamen Jeliazkov added a comment - Attaching patch which highlights Bogdan Raducanu 's issue. If you run the TestFileTruncate tests with ' HDFS-9516 _testFailures.patch' applied you will see 6 failures: testUpgradeAndRestart testSnapshotWithAppendTruncate testCopyOnTruncateWithDataNodesRestart testSnapshotWithTruncates testTruncateRecovery testSnapshotTruncateThenDeleteSnapshot Basic proposal for fix is to use the same FsVolume as which the 'replicaInfo under recovery' is on rather than try to find a new Volume of the same disk type.
          Hide
          shv Konstantin Shvachko added a comment -

          Btw which branch are you running it on? Couldn't match exactly the line numbers with any of the 2s.

          Show
          shv Konstantin Shvachko added a comment - Btw which branch are you running it on? Couldn't match exactly the line numbers with any of the 2s.
          Hide
          shv Konstantin Shvachko added a comment -

          Indeed, looks like an attempt to rename across volumes. Good catch, Bogdan. And analysis too.
          The problem is that copyReplicaWithNewBlockIdAndGS() does not take into account which volume is the rur replica on, and can choose a different one.
          I don't think this affects anything, but truncate in the case of copy-on-truncate, which involves upgrades and snapshots.

          I was wondering if you traced this condition further in time. This recovery should fail, and another would start some time later, eventually the same volume should be chosen and that last recovery should succeed.

          Show
          shv Konstantin Shvachko added a comment - Indeed, looks like an attempt to rename across volumes. Good catch, Bogdan. And analysis too. The problem is that copyReplicaWithNewBlockIdAndGS() does not take into account which volume is the rur replica on, and can choose a different one. I don't think this affects anything, but truncate in the case of copy-on-truncate, which involves upgrades and snapshots. I was wondering if you traced this condition further in time. This recovery should fail, and another would start some time later, eventually the same volume should be chosen and that last recovery should succeed.

            People

            • Assignee:
              zero45 Plamen Jeliazkov
              Reporter:
              bograd Bogdan Raducanu
            • Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development