Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9516

truncate file fails with data dirs on multiple disks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.7.1
    • 2.8.0, 2.7.3, 3.0.0-alpha1
    • datanode
    • None
    • Reviewed

    Description

      FileSystem.truncate returns false (no exception) but the file is never closed and not writable after this.

      It seems to be because of copy on truncate which is used because the system is in upgrade state. In this case a rename between devices is attempted.
      See attached log and repro code.
      Probably also affects truncate snapshotted file when copy on truncate is also used.
      Possibly it affects not only truncate but any block recovery.

      I think the problem is in updateReplicaUnderRecovery

      ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten(
                  newBlockId, recoveryId, rur.getVolume(), blockFile.getParentFile(),
                  newlength);
      

      blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to choose any volume so rur.getVolume() is not where the block is located.

      Attachments

        1. truncate.dn.log
          3 kB
          Bogdan Raducanu
        2. Main.java
          2 kB
          Bogdan Raducanu
        3. HDFS-9516_testFailures.patch
          1 kB
          Plamen Jeliazkov
        4. HDFS-9516_3.patch
          3 kB
          Plamen Jeliazkov
        5. HDFS-9516_2.patch
          2 kB
          Plamen Jeliazkov
        6. HDFS-9516_1.patch
          2 kB
          Plamen Jeliazkov

        Issue Links

          Activity

            People

              zero45 Plamen Jeliazkov
              bograd Bogdan Raducanu
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: