Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-11056

Concurrent append and read operations lead to checksum error

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Load last partial chunk checksum properly into memory when converting a finalized/temporary replica to rbw replica. This ensures concurrent reader reads the correct checksum that matches the data before the update.

      Description

      If there are two clients, one of them open-append-close a file continuously, while the other open-read-close the same file continuously, the reader eventually gets a checksum error in the data read.

      On my local Mac, it takes a few minutes to produce the error. This happens to httpfs clients, but there's no reason not believe this happens to any append clients.

      I have a unit test that demonstrates the checksum error. Will attach later.

      Relevant log:

      2016-10-25 15:34:45,153 INFO audit - allowed=true ugi=weichiu (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/tmp/bar.txt dst=null perm=null proto=rpc
      2016-10-25 15:34:45,155 INFO DataNode - Receiving BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 src: /127.0.0.1:51130 dest: /127.0.0.1:50131
      2016-10-25 15:34:45,155 INFO FsDatasetImpl - Appending to FinalizedReplica, blk_1073741825_1182, FINALIZED
      getNumBytes() = 182
      getBytesOnDisk() = 182
      getVisibleLength()= 182
      getVolume() = /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
      getBlockURI() = file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
      2016-10-25 15:34:45,167 INFO DataNode - opReadBlock BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 received exception java.io.IOException: No data exists for block BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
      2016-10-25 15:34:45,167 WARN DataNode - DatanodeRegistration(127.0.0.1:50131, datanodeUuid=41c96335-5e4b-4950-ac22-3d21b353abb8, infoPort=50133, infoSecurePort=0, ipcPort=50134, storageInfo=lv=-57;cid=testClusterID;nsid=1472068852;c=1477434851452):Got exception while serving BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 to /127.0.0.1:51121
      java.io.IOException: No data exists for block BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
      at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
      at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:400)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
      at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
      at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
      at java.lang.Thread.run(Thread.java:745)
      2016-10-25 15:34:45,168 INFO FSNamesystem - updatePipeline(blk_1073741825_1182, newGS=1183, newLength=182, newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
      2016-10-25 15:34:45,168 ERROR DataNode - 127.0.0.1:50131:DataXceiver error processing READ_BLOCK operation src: /127.0.0.1:51121 dst: /127.0.0.1:50131
      java.io.IOException: No data exists for block BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182
      at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockInputStream(FsDatasetImpl.java:773)
      at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:400)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:581)
      at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:150)
      at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:102)
      at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
      at java.lang.Thread.run(Thread.java:745)
      2016-10-25 15:34:45,168 INFO FSNamesystem - updatePipeline(blk_1073741825_1182 => blk_1073741825_1183) success
      2016-10-25 15:34:45,170 WARN DFSClient - Found Checksum error for BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from DatanodeInfoWithStorage[127.0.0.1:50131,DS-a1878418-4f7f-4fc9-b3f7-d7ed780b5373,DISK] at 0
      2016-10-25 15:34:45,170 WARN DFSClient - No live nodes contain block BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 after checking nodes = [DatanodeInfoWithStorage[127.0.0.1:50131,DS-a1878418-4f7f-4fc9-b3f7-d7ed780b5373,DISK]], ignoredNodes = null
      2016-10-25 15:34:45,170 INFO DFSClient - Could not obtain BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 from any node: No live nodes contain current block Block locations: DatanodeInfoWithStorage[127.0.0.1:50131,DS-a1878418-4f7f-4fc9-b3f7-d7ed780b5373,DISK] Dead nodes: DatanodeInfoWithStorage[127.0.0.1:50131,DS-a1878418-4f7f-4fc9-b3f7-d7ed780b5373,DISK]. Will get new block locations from namenode and retry...
      2016-10-25 15:34:45,170 WARN DFSClient - DFS chooseDataNode: got # 1 IOException, will wait for 981.8085941094539 msec.
      2016-10-25 15:34:45,171 INFO clienttrace - src: /127.0.0.1:51130, dest: /127.0.0.1:50131, bytes: 183, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1743096965_197, offset: 0, srvID: 41c96335-5e4b-4950-ac22-3d21b353abb8, blockid: BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1183, duration: 2175363
      2016-10-25 15:34:45,171 INFO DataNode - PacketResponder: BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1183, type=LAST_IN_PIPELINE terminating
      2016-10-25 15:34:45,172 INFO FSNamesystem - BLOCK* blk_1073741825_1183 is COMMITTED but not COMPLETE(numNodes= 0 < minimum = 1) in file /tmp/bar.txt
      2016-10-25 15:34:45,576 INFO StateChange - DIR* completeFile: /tmp/bar.txt is closed by DFSClient_NONMAPREDUCE_-1743096965_197
      2016-10-25 15:34:45,577 INFO httpfsaudit - [/tmp/bar.txt]
      2016-10-25 15:34:45,579 INFO AppendTestUtil - seed=-3144873070946578911, size=1
      2016-10-25 15:34:45,590 INFO audit - allowed=true ugi=weichiu (auth:PROXY) via weichiu (auth:SIMPLE) ip=/127.0.0.1 cmd=append src=/tmp/bar.txt dst=null perm=null proto=rpc
      2016-10-25 15:34:45,593 INFO DataNode - Receiving BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1183 src: /127.0.0.1:51132 dest: /127.0.0.1:50131
      2016-10-25 15:34:45,593 INFO FsDatasetImpl - Appending to FinalizedReplica, blk_1073741825_1183, FINALIZED
      getNumBytes() = 183
      getBytesOnDisk() = 183
      getVisibleLength()= 183
      getVolume() = /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
      getBlockURI() = file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
      2016-10-25 15:34:45,603 INFO FSNamesystem - updatePipeline(blk_1073741825_1183, newGS=1184, newLength=183, newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
      2016-10-25 15:34:45,603 INFO FSNamesystem - updatePipeline(blk_1073741825_1183 => blk_1073741825_1184) success
      2016-10-25 15:34:45,605 INFO clienttrace - src: /127.0.0.1:51132, dest: /127.0.0.1:50131, bytes: 184, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1743096965_197, offset: 0, srvID: 41c96335-5e4b-4950-ac22-3d21b353abb8, blockid: BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1184, duration: 1377229
      2016-10-25 15:34:45,605 INFO DataNode - PacketResponder: BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1184, type=LAST_IN_PIPELINE terminating
      2016-10-25 15:34:45,606 INFO FSNamesystem - BLOCK* blk_1073741825_1184 is COMMITTED but not COMPLETE(numNodes= 0 < minimum = 1) in file /tmp/bar.txt
      2016-10-25 15:34:46,009 INFO StateChange - DIR* completeFile: /tmp/bar.txt is closed by DFSClient_NONMAPREDUCE_-1743096965_197
      2016-10-25 15:34:46,010 INFO httpfsaudit - [/tmp/bar.txt]
      2016-10-25 15:34:46,012 INFO AppendTestUtil - seed=-263001291976323720, size=1
      2016-10-25 15:34:46,022 INFO audit - allowed=true ugi=weichiu (auth:PROXY) via weichiu (auth:SIMPLE) ip=/127.0.0.1 cmd=append src=/tmp/bar.txt dst=null perm=null proto=rpc
      2016-10-25 15:34:46,024 INFO DataNode - Receiving BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1184 src: /127.0.0.1:51133 dest: /127.0.0.1:50131
      2016-10-25 15:34:46,024 INFO FsDatasetImpl - Appending to FinalizedReplica, blk_1073741825_1184, FINALIZED
      getNumBytes() = 184
      getBytesOnDisk() = 184
      getVisibleLength()= 184
      getVolume() = /Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1
      getBlockURI() = file:/Users/weichiu/sandbox/hadoop/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/test-dir/dfs/data/data1/current/BP-837130339-172.16.1.88-1477434851452/current/finalized/subdir0/subdir0/blk_1073741825
      2016-10-25 15:34:46,032 INFO FSNamesystem - updatePipeline(blk_1073741825_1184, newGS=1185, newLength=184, newNodes=[127.0.0.1:50131], client=DFSClient_NONMAPREDUCE_-1743096965_197)
      2016-10-25 15:34:46,032 INFO FSNamesystem - updatePipeline(blk_1073741825_1184 => blk_1073741825_1185) success
      2016-10-25 15:34:46,033 INFO clienttrace - src: /127.0.0.1:51133, dest: /127.0.0.1:50131, bytes: 185, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1743096965_197, offset: 0, srvID: 41c96335-5e4b-4950-ac22-3d21b353abb8, blockid: BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1185, duration: 1112564
      2016-10-25 15:34:46,033 INFO DataNode - PacketResponder: BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1185, type=LAST_IN_PIPELINE terminating
      2016-10-25 15:34:46,033 INFO FSNamesystem - BLOCK* blk_1073741825_1185 is COMMITTED but not COMPLETE(numNodes= 0 < minimum = 1) in file /tmp/bar.txt
      2016-10-25 15:34:46,156 INFO audit - allowed=true ugi=weichiu (auth:SIMPLE) ip=/127.0.0.1 cmd=open src=/tmp/bar.txt dst=null perm=null proto=rpc
      2016-10-25 15:34:46,158 INFO StateChange - DIR reportBadBlocks for block: BP-837130339-172.16.1.88-1477434851452:blk_1073741825_1182 on datanode: 127.0.0.1:50131
      Exception in thread "Thread-144" java.lang.RuntimeException: org.apache.hadoop.fs.ChecksumException: Checksum CRC32C not matched for file /tmp/bar.txt at position 0: expected=C893FEDE but computed=69322F90, algorithm=PureJavaCrc32C
      at org.apache.hadoop.fs.http.client.BaseTestHttpFSWith$1ReaderRunnable.run(BaseTestHttpFSWith.java:309)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.apache.hadoop.fs.ChecksumException: Checksum CRC32C not matched for file /tmp/bar.txt at position 0: expected=C893FEDE but computed=69322F90, algorithm=PureJavaCrc32C
      at org.apache.hadoop.util.DataChecksum.throwChecksumException(DataChecksum.java:407)
      at org.apache.hadoop.util.DataChecksum.verifyChunked(DataChecksum.java:351)
      at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:311)
      at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:216)
      at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:144)
      at org.apache.hadoop.hdfs.ByteArrayStrategy.readFromBlock(ReaderStrategy.java:119)
      at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:704)
      at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:765)
      at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:814)
      at java.io.DataInputStream.read(DataInputStream.java:149)
      at org.apache.hadoop.fs.http.client.BaseTestHttpFSWith$1ReaderRunnable.run(BaseTestHttpFSWith.java:302)
      ... 1 more
      2016-10-25 15:34:46,437 INFO StateChange - DIR* completeFile: /tmp/bar.txt is closed by DFSClient_NONMAPREDUCE_-1743096965_197
      2016-10-25 15:34:46,437 INFO httpfsaudit - [/tmp/bar.txt]
      2016-10-25 15:34:46,440 INFO AppendTestUtil - seed=8756761565208093670, size=1
      2016-10-25 15:34:46,450 WARN StateChange - DIR* NameSystem.append: append: lastBlock=blk_1073741825_1185 of src=/tmp/bar.txt is not sufficiently replicated yet.
      2016-10-25 15:34:46,450 INFO Server - IPC Server handler 7 on 50130, call Call#25082 Retry#0 org.apache.hadoop.hdfs.protocol.ClientProtocol.append from 127.0.0.1:50147
      java.io.IOException: append: lastBlock=blk_1073741825_1185 of src=/tmp/bar.txt is not sufficiently replicated yet.
      at org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile(FSDirAppendOp.java:136)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2423)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:773)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:444)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
      at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:467)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:990)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
      at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1795)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2535)

      Exception in thread "Thread-143" java.lang.RuntimeException: java.io.IOException: HTTP status [500], exception [org.apache.hadoop.ipc.RemoteException], message [append: lastBlock=blk_1073741825_1185 of src=/tmp/bar.txt is not sufficiently replicated yet.]
      at org.apache.hadoop.fs.http.client.BaseTestHttpFSWith$1.run(BaseTestHttpFSWith.java:283)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.IOException: HTTP status [500], exception [org.apache.hadoop.ipc.RemoteException], message [append: lastBlock=blk_1073741825_1185 of src=/tmp/bar.txt is not sufficiently replicated yet.]
      at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:159)
      at org.apache.hadoop.fs.http.client.HttpFSFileSystem$HttpFSDataOutputStream.close(HttpFSFileSystem.java:470)
      at org.apache.hadoop.fs.http.client.BaseTestHttpFSWith$1.run(BaseTestHttpFSWith.java:279)
      ... 1 more

      org.apache.hadoop.fs.ChecksumException: Checksum CRC32C not matched for file /tmp/bar.txt at position 0: expected=C893FEDE but computed=69322F90, algorithm=PureJavaCrc32C

      at org.apache.hadoop.util.DataChecksum.throwChecksumException(DataChecksum.java:407)
      at org.apache.hadoop.util.DataChecksum.verifyChunked(DataChecksum.java:351)
      at org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:311)
      at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:216)
      at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:144)
      at org.apache.hadoop.hdfs.ByteArrayStrategy.readFromBlock(ReaderStrategy.java:119)
      at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:704)
      at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:765)
      at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:814)
      at java.io.DataInputStream.read(DataInputStream.java:149)
      at org.apache.hadoop.fs.http.client.BaseTestHttpFSWith$1ReaderRunnable.run(BaseTestHttpFSWith.java:302)
      at java.lang.Thread.run(Thread.java:745)

      1. HDFS-11056.branch-2.7.patch
        13 kB
        Wei-Chiu Chuang
      2. HDFS-11056.branch-2.patch
        10 kB
        Wei-Chiu Chuang
      3. HDFS-11056.002.patch
        11 kB
        Wei-Chiu Chuang
      4. HDFS-11056.001.patch
        4 kB
        Wei-Chiu Chuang
      5. HDFS-11056.reproduce.patch
        3 kB
        Wei-Chiu Chuang

        Issue Links

          Activity

          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Attach a unit test to reproduce the error.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Attach a unit test to reproduce the error.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          This bug seems to be the root cause of HDFS-11022 in the first place.

          Show
          jojochuang Wei-Chiu Chuang added a comment - This bug seems to be the root cause of HDFS-11022 in the first place.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          The checksum error seems to occur when a client reads a RBW replica, when it is being appended but unfinalized.

          Maybe the replica was read while checksum metafile was not updated? That would be my guess.

          Show
          jojochuang Wei-Chiu Chuang added a comment - The checksum error seems to occur when a client reads a RBW replica, when it is being appended but unfinalized. Maybe the replica was read while checksum metafile was not updated? That would be my guess.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Still working on this. It looks like when data is being appended, checksum is written to on disk metafile.
          When another client reads data, it reads checksum from on-disk metadata (which is the most up to date, corresponds to the data already written to disk, but not visible to client yet), instead of in-memory checksum (which is the snapshot). But it should have read from in-memory checksum.

          So the inconsistency between data and checksum causes incorrect checksum to be read.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Still working on this. It looks like when data is being appended, checksum is written to on disk metafile. When another client reads data, it reads checksum from on-disk metadata (which is the most up to date, corresponds to the data already written to disk, but not visible to client yet), instead of in-memory checksum (which is the snapshot). But it should have read from in-memory checksum. So the inconsistency between data and checksum causes incorrect checksum to be read.
          Hide
          jojochuang Wei-Chiu Chuang added a comment - - edited

          I believe I have found the root cause of bug:

          The bug is, when BlockSender sends a RBW block and it reads the last block and checksum, it is supposed to read in-memory checksum, which is (supposedly) the correct checksum corresponding to the un-appended data (visible length).

          However, the in-memory checksum of the ReplicaInPipeline object is null and BlockSender therefore skips reading in-memory checksum and use on-disk checksum instead, which results in checksum error, because on-disk checksum corresponds to on-disk data (but may not be the visible data)

          The checksum is null, because when a replica is being converted to RBW from Finalized for append, it does not call setLastChecksumAndDataLen(). (See: FsVolumeImpl#append)

          This bug is subtle, and is only exposed when reading a replica whose on-disk data length is longer than visible length.

          Show
          jojochuang Wei-Chiu Chuang added a comment - - edited I believe I have found the root cause of bug: The bug is, when BlockSender sends a RBW block and it reads the last block and checksum, it is supposed to read in-memory checksum, which is (supposedly) the correct checksum corresponding to the un-appended data (visible length). However, the in-memory checksum of the ReplicaInPipeline object is null and BlockSender therefore skips reading in-memory checksum and use on-disk checksum instead, which results in checksum error, because on-disk checksum corresponds to on-disk data (but may not be the visible data) The checksum is null, because when a replica is being converted to RBW from Finalized for append, it does not call setLastChecksumAndDataLen(). (See: FsVolumeImpl#append) This bug is subtle, and is only exposed when reading a replica whose on-disk data length is longer than visible length.
          Hide
          yzhangal Yongjun Zhang added a comment -

          That's very nice findings Wei-Chiu Chuang! Congrats!

          Can we create a unit test to reproduce the scenario? Thanks.

          Show
          yzhangal Yongjun Zhang added a comment - That's very nice findings Wei-Chiu Chuang ! Congrats! Can we create a unit test to reproduce the scenario? Thanks.
          Hide
          jojochuang Wei-Chiu Chuang added a comment - - edited

          FsVolumeImpl#convertTemporaryToRbw does not generate last checksum, so it seems to suffer from the same bug. This is likely what caused the checksum error in HDFS-6804.

          In addition to FsVolumeImpl#append() (when it converts a Finalized replica to Rbw replica) and FsVolumeImpl#convertTemporaryToRbw() (converts Temporary replica to Rbw replica), FsVolumeImpl#updateRURCopyOnTruncate() converts a RUR replica to Rbw replica. I think all of them should reload checksum when replica is converted to Rbw state.

          Show
          jojochuang Wei-Chiu Chuang added a comment - - edited FsVolumeImpl#convertTemporaryToRbw does not generate last checksum, so it seems to suffer from the same bug. This is likely what caused the checksum error in HDFS-6804 . In addition to FsVolumeImpl#append() (when it converts a Finalized replica to Rbw replica) and FsVolumeImpl#convertTemporaryToRbw() (converts Temporary replica to Rbw replica), FsVolumeImpl#updateRURCopyOnTruncate() converts a RUR replica to Rbw replica. I think all of them should reload checksum when replica is converted to Rbw state.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          I have a proof of concept fix, but getting a unit test that reliably reproduce the error seems tricky given there are many moving parts.

          The major hurdle is to create a replica which is in Rbw state whose visible length != on-disk length, and let a client read the replica concurrently.

          Show
          jojochuang Wei-Chiu Chuang added a comment - I have a proof of concept fix, but getting a unit test that reliably reproduce the error seems tricky given there are many moving parts. The major hurdle is to create a replica which is in Rbw state whose visible length != on-disk length, and let a client read the replica concurrently.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Ok. I've got a patch to fix the bug. I let it ran for a while without seeing the checksum error. Attach this patch to test against other unit tests.

          This patch does not have unit test. I will try to come up with one.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Ok. I've got a patch to fix the bug. I let it ran for a while without seeing the checksum error. Attach this patch to test against other unit tests. This patch does not have unit test. I will try to come up with one.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 23s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
          +1 mvninstall 8m 6s trunk passed
          +1 compile 0m 49s trunk passed
          +1 checkstyle 0m 27s trunk passed
          +1 mvnsite 1m 0s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 52s trunk passed
          +1 javadoc 0m 43s trunk passed
          +1 mvninstall 0m 55s the patch passed
          +1 compile 0m 50s the patch passed
          +1 javac 0m 50s the patch passed
          +1 checkstyle 0m 27s the patch passed
          +1 mvnsite 0m 54s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 51s the patch passed
          +1 javadoc 0m 39s the patch passed
          -1 unit 57m 52s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          78m 48s



          Reason Tests
          Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID
            hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-11056
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12836460/HDFS-11056.001.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 938dfd53d8ab 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / aacf214
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17371/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17371/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17371/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 23s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. -1 test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 mvninstall 8m 6s trunk passed +1 compile 0m 49s trunk passed +1 checkstyle 0m 27s trunk passed +1 mvnsite 1m 0s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 52s trunk passed +1 javadoc 0m 43s trunk passed +1 mvninstall 0m 55s the patch passed +1 compile 0m 50s the patch passed +1 javac 0m 50s the patch passed +1 checkstyle 0m 27s the patch passed +1 mvnsite 0m 54s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 51s the patch passed +1 javadoc 0m 39s the patch passed -1 unit 57m 52s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 78m 48s Reason Tests Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeUUID   hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-11056 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12836460/HDFS-11056.001.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 938dfd53d8ab 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / aacf214 Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17371/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17371/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17371/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          TestWriteToReplica#testAppend failed because it tries to read from the meta file of a non-existent block replica.

          Show
          jojochuang Wei-Chiu Chuang added a comment - TestWriteToReplica#testAppend failed because it tries to read from the meta file of a non-existent block replica.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Upload patch v002.
          This version fixed the bug in TestWriteToReplica (the metafile was not initialized with header) also, added a unit test to reproduce the bug.

          Without the patch, the unit test fails with Checksum mismatch error, because BlockSender reads on-disk checksum of Rbw replica incorrectly. After the patch, BlockSender reads in-memory checksum correctly and test will pass.

          I'd like to ask watcher of this jira to review the patch v002. Thanks very much!

          Show
          jojochuang Wei-Chiu Chuang added a comment - Upload patch v002. This version fixed the bug in TestWriteToReplica (the metafile was not initialized with header) also, added a unit test to reproduce the bug. Without the patch, the unit test fails with Checksum mismatch error, because BlockSender reads on-disk checksum of Rbw replica incorrectly. After the patch, BlockSender reads in-memory checksum correctly and test will pass. I'd like to ask watcher of this jira to review the patch v002. Thanks very much!
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 14s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 22s trunk passed
          +1 compile 0m 45s trunk passed
          +1 checkstyle 0m 25s trunk passed
          +1 mvnsite 0m 52s trunk passed
          +1 mvneclipse 0m 13s trunk passed
          +1 findbugs 1m 43s trunk passed
          +1 javadoc 0m 39s trunk passed
          +1 mvninstall 0m 45s the patch passed
          +1 compile 0m 42s the patch passed
          +1 javac 0m 42s the patch passed
          +1 checkstyle 0m 23s the patch passed
          +1 mvnsite 0m 49s the patch passed
          +1 mvneclipse 0m 10s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 1m 46s the patch passed
          +1 javadoc 0m 37s the patch passed
          -1 unit 54m 30s hadoop-hdfs in the patch failed.
          +1 asflicense 0m 18s The patch does not generate ASF License warnings.
          73m 25s



          Reason Tests
          Failed junit tests hadoop.hdfs.qjournal.client.TestQuorumJournalManager



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:9560f25
          JIRA Issue HDFS-11056
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12836626/HDFS-11056.002.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 9c23a583ac15 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision trunk / 0dc2a6a
          Default Java 1.8.0_101
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17389/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
          Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17389/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17389/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 14s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 22s trunk passed +1 compile 0m 45s trunk passed +1 checkstyle 0m 25s trunk passed +1 mvnsite 0m 52s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 1m 43s trunk passed +1 javadoc 0m 39s trunk passed +1 mvninstall 0m 45s the patch passed +1 compile 0m 42s the patch passed +1 javac 0m 42s the patch passed +1 checkstyle 0m 23s the patch passed +1 mvnsite 0m 49s the patch passed +1 mvneclipse 0m 10s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 1m 46s the patch passed +1 javadoc 0m 37s the patch passed -1 unit 54m 30s hadoop-hdfs in the patch failed. +1 asflicense 0m 18s The patch does not generate ASF License warnings. 73m 25s Reason Tests Failed junit tests hadoop.hdfs.qjournal.client.TestQuorumJournalManager Subsystem Report/Notes Docker Image:yetus/hadoop:9560f25 JIRA Issue HDFS-11056 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12836626/HDFS-11056.002.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 9c23a583ac15 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 0dc2a6a Default Java 1.8.0_101 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17389/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17389/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17389/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          The test failure is unrelated.
          Lei (Eddy) Xu Virajith Jalaparti would you like to make a comment? I saw that HDFS-10636 refactored a lot of relevant code, but I do think the same bug existed pre HDFS-10636.

          Show
          jojochuang Wei-Chiu Chuang added a comment - The test failure is unrelated. Lei (Eddy) Xu Virajith Jalaparti would you like to make a comment? I saw that HDFS-10636 refactored a lot of relevant code, but I do think the same bug existed pre HDFS-10636 .
          Hide
          kihwal Kihwal Lee added a comment -

          I was looking at the patch since yesterday. It looks like the partial chunk sum is loaded from disk and saved in memory before it is modified. That seems like a correct approach. +1

          Show
          kihwal Kihwal Lee added a comment - I was looking at the patch since yesterday. It looks like the partial chunk sum is loaded from disk and saved in memory before it is modified. That seems like a correct approach. +1
          Hide
          eddyxu Lei (Eddy) Xu added a comment -

          Hi, Wei-Chiu Chuang. HDFS-10636 is not related change.

          Show
          eddyxu Lei (Eddy) Xu added a comment - Hi, Wei-Chiu Chuang . HDFS-10636 is not related change.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Hi Kihwal Lee thanks for the review!

          This fix re-computes last chunk checksum when converting finalized/temporary replica to rbw replica. Would you think it may be more efficient if we store the last chunk checksum in finalized/temporary replica object, so that it may be more efficient if there are frequent open->append->close operations?

          Show
          jojochuang Wei-Chiu Chuang added a comment - Hi Kihwal Lee thanks for the review! This fix re-computes last chunk checksum when converting finalized/temporary replica to rbw replica. Would you think it may be more efficient if we store the last chunk checksum in finalized/temporary replica object, so that it may be more efficient if there are frequent open->append->close operations?
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          If no one objects – I will commit the latest patch by end of Tuesday, and I will file a follow up jira to study if it's necessary to optimize checksum calculation by adding the last chunk checksum into finalized/temporary replica class.

          Show
          jojochuang Wei-Chiu Chuang added a comment - If no one objects – I will commit the latest patch by end of Tuesday, and I will file a follow up jira to study if it's necessary to optimize checksum calculation by adding the last chunk checksum into finalized/temporary replica class.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Attach branch-2 patch for precommit check

          Show
          jojochuang Wei-Chiu Chuang added a comment - Attach branch-2 patch for precommit check
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 26s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 7m 20s branch-2 passed
          +1 compile 0m 48s branch-2 passed with JDK v1.8.0_101
          +1 compile 0m 45s branch-2 passed with JDK v1.7.0_111
          +1 checkstyle 0m 28s branch-2 passed
          +1 mvnsite 0m 53s branch-2 passed
          +1 mvneclipse 0m 16s branch-2 passed
          +1 findbugs 1m 57s branch-2 passed
          +1 javadoc 1m 3s branch-2 passed with JDK v1.8.0_101
          +1 javadoc 1m 49s branch-2 passed with JDK v1.7.0_111
          +1 mvninstall 0m 46s the patch passed
          +1 compile 0m 46s the patch passed with JDK v1.8.0_101
          +1 javac 0m 46s the patch passed
          +1 compile 0m 43s the patch passed with JDK v1.7.0_111
          +1 javac 0m 43s the patch passed
          +1 checkstyle 0m 26s the patch passed
          +1 mvnsite 0m 53s the patch passed
          +1 mvneclipse 0m 13s the patch passed
          +1 whitespace 0m 0s The patch has no whitespace issues.
          +1 findbugs 2m 11s the patch passed
          +1 javadoc 0m 57s the patch passed with JDK v1.8.0_101
          +1 javadoc 1m 33s the patch passed with JDK v1.7.0_111
          -1 unit 71m 3s hadoop-hdfs in the patch failed with JDK v1.7.0_111.
          +1 asflicense 0m 20s The patch does not generate ASF License warnings.
          168m 18s



          Reason Tests
          JDK v1.8.0_101 Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache
          JDK v1.7.0_111 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:b59b8b7
          JIRA Issue HDFS-11056
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838052/HDFS-11056.branch-2.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux 2b8b5a8d049c 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2 / b77239b
          Default Java 1.7.0_111
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111
          findbugs v3.0.0
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17475/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_111.txt
          JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17475/testReport/
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17475/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 26s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 7m 20s branch-2 passed +1 compile 0m 48s branch-2 passed with JDK v1.8.0_101 +1 compile 0m 45s branch-2 passed with JDK v1.7.0_111 +1 checkstyle 0m 28s branch-2 passed +1 mvnsite 0m 53s branch-2 passed +1 mvneclipse 0m 16s branch-2 passed +1 findbugs 1m 57s branch-2 passed +1 javadoc 1m 3s branch-2 passed with JDK v1.8.0_101 +1 javadoc 1m 49s branch-2 passed with JDK v1.7.0_111 +1 mvninstall 0m 46s the patch passed +1 compile 0m 46s the patch passed with JDK v1.8.0_101 +1 javac 0m 46s the patch passed +1 compile 0m 43s the patch passed with JDK v1.7.0_111 +1 javac 0m 43s the patch passed +1 checkstyle 0m 26s the patch passed +1 mvnsite 0m 53s the patch passed +1 mvneclipse 0m 13s the patch passed +1 whitespace 0m 0s The patch has no whitespace issues. +1 findbugs 2m 11s the patch passed +1 javadoc 0m 57s the patch passed with JDK v1.8.0_101 +1 javadoc 1m 33s the patch passed with JDK v1.7.0_111 -1 unit 71m 3s hadoop-hdfs in the patch failed with JDK v1.7.0_111. +1 asflicense 0m 20s The patch does not generate ASF License warnings. 168m 18s Reason Tests JDK v1.8.0_101 Failed junit tests hadoop.hdfs.server.datanode.TestFsDatasetCache JDK v1.7.0_111 Failed junit tests hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure Subsystem Report/Notes Docker Image:yetus/hadoop:b59b8b7 JIRA Issue HDFS-11056 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838052/HDFS-11056.branch-2.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux 2b8b5a8d049c 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2 / b77239b Default Java 1.7.0_111 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_101 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111 findbugs v3.0.0 unit https://builds.apache.org/job/PreCommit-HDFS-Build/17475/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_111.txt JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17475/testReport/ modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17475/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          The branch-2 failed tests are not related.

          Show
          jojochuang Wei-Chiu Chuang added a comment - The branch-2 failed tests are not related.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10802 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10802/)
          HDFS-11056. Concurrent append and read operations lead to checksum (weichiu: rev c619e9b43fd00ba0e59a98ae09685ff719bb722b)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImplTestUtils.java
          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10802 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10802/ ) HDFS-11056 . Concurrent append and read operations lead to checksum (weichiu: rev c619e9b43fd00ba0e59a98ae09685ff719bb722b) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImplTestUtils.java (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileAppend.java
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Attach branch-2.7 patch for precommit check.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Attach branch-2.7 patch for precommit check.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 11m 27s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
          +1 mvninstall 8m 3s branch-2.7 passed
          +1 compile 0m 57s branch-2.7 passed with JDK v1.8.0_111
          +1 compile 0m 58s branch-2.7 passed with JDK v1.7.0_111
          +1 checkstyle 0m 25s branch-2.7 passed
          +1 mvnsite 0m 59s branch-2.7 passed
          +1 mvneclipse 0m 16s branch-2.7 passed
          +1 findbugs 2m 55s branch-2.7 passed
          +1 javadoc 0m 58s branch-2.7 passed with JDK v1.8.0_111
          +1 javadoc 1m 40s branch-2.7 passed with JDK v1.7.0_111
          +1 mvninstall 0m 50s the patch passed
          +1 compile 0m 55s the patch passed with JDK v1.8.0_111
          +1 javac 0m 55s the patch passed
          +1 compile 0m 58s the patch passed with JDK v1.7.0_111
          +1 javac 0m 58s the patch passed
          -0 checkstyle 0m 21s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 130 unchanged - 2 fixed = 132 total (was 132)
          +1 mvnsite 0m 55s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          -1 whitespace 0m 0s The patch has 2270 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 whitespace 0m 58s The patch 139 line(s) with tabs.
          +1 findbugs 3m 3s the patch passed
          +1 javadoc 0m 55s the patch passed with JDK v1.8.0_111
          +1 javadoc 1m 41s the patch passed with JDK v1.7.0_111
          -1 unit 43m 34s hadoop-hdfs in the patch failed with JDK v1.7.0_111.
          -1 asflicense 0m 20s The patch generated 3 ASF License warnings.
          133m 30s



          Reason Tests
          JDK v1.8.0_111 Failed junit tests hadoop.hdfs.server.datanode.TestBPOfferService
            hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica
            hadoop.hdfs.server.blockmanagement.TestBlockManager
            hadoop.hdfs.web.TestHttpsFileSystem
            hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
            hadoop.hdfs.server.datanode.TestBlockReplacement
          JDK v1.7.0_111 Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica
            hadoop.hdfs.TestDFSShell
            hadoop.hdfs.web.TestHttpsFileSystem
            hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:c420dfe
          JIRA Issue HDFS-11056
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838249/HDFS-11056.branch-2.7.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux ad33b40e2a97 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.7 / d762730
          Default Java 1.7.0_111
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/whitespace-eol.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/whitespace-tabs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_111.txt
          JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17490/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17490/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 11m 27s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 8m 3s branch-2.7 passed +1 compile 0m 57s branch-2.7 passed with JDK v1.8.0_111 +1 compile 0m 58s branch-2.7 passed with JDK v1.7.0_111 +1 checkstyle 0m 25s branch-2.7 passed +1 mvnsite 0m 59s branch-2.7 passed +1 mvneclipse 0m 16s branch-2.7 passed +1 findbugs 2m 55s branch-2.7 passed +1 javadoc 0m 58s branch-2.7 passed with JDK v1.8.0_111 +1 javadoc 1m 40s branch-2.7 passed with JDK v1.7.0_111 +1 mvninstall 0m 50s the patch passed +1 compile 0m 55s the patch passed with JDK v1.8.0_111 +1 javac 0m 55s the patch passed +1 compile 0m 58s the patch passed with JDK v1.7.0_111 +1 javac 0m 58s the patch passed -0 checkstyle 0m 21s hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 130 unchanged - 2 fixed = 132 total (was 132) +1 mvnsite 0m 55s the patch passed +1 mvneclipse 0m 12s the patch passed -1 whitespace 0m 0s The patch has 2270 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 whitespace 0m 58s The patch 139 line(s) with tabs. +1 findbugs 3m 3s the patch passed +1 javadoc 0m 55s the patch passed with JDK v1.8.0_111 +1 javadoc 1m 41s the patch passed with JDK v1.7.0_111 -1 unit 43m 34s hadoop-hdfs in the patch failed with JDK v1.7.0_111. -1 asflicense 0m 20s The patch generated 3 ASF License warnings. 133m 30s Reason Tests JDK v1.8.0_111 Failed junit tests hadoop.hdfs.server.datanode.TestBPOfferService   hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica   hadoop.hdfs.server.blockmanagement.TestBlockManager   hadoop.hdfs.web.TestHttpsFileSystem   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots   hadoop.hdfs.server.datanode.TestBlockReplacement JDK v1.7.0_111 Failed junit tests hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica   hadoop.hdfs.TestDFSShell   hadoop.hdfs.web.TestHttpsFileSystem   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Subsystem Report/Notes Docker Image:yetus/hadoop:c420dfe JIRA Issue HDFS-11056 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838249/HDFS-11056.branch-2.7.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux ad33b40e2a97 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.7 / d762730 Default Java 1.7.0_111 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/whitespace-eol.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/whitespace-tabs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_111.txt JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17490/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/17490/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17490/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          I committed the patch to trunk, branch-2 and branch-2.8, and I am still working on a branch-2.7 patch.

          Show
          jojochuang Wei-Chiu Chuang added a comment - I committed the patch to trunk, branch-2 and branch-2.8, and I am still working on a branch-2.7 patch.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          Attach an updated branch-2.7 patch.

          Show
          jojochuang Wei-Chiu Chuang added a comment - Attach an updated branch-2.7 patch.
          Hide
          hadoopqa Hadoop QA added a comment -
          -1 overall



          Vote Subsystem Runtime Comment
          0 reexec 0m 13s Docker mode activated.
          +1 @author 0m 0s The patch does not contain any @author tags.
          +1 test4tests 0m 0s The patch appears to include 2 new or modified test files.
          +1 mvninstall 5m 44s branch-2.7 passed
          +1 compile 0m 56s branch-2.7 passed with JDK v1.8.0_111
          +1 compile 1m 1s branch-2.7 passed with JDK v1.7.0_111
          +1 checkstyle 0m 24s branch-2.7 passed
          +1 mvnsite 0m 56s branch-2.7 passed
          +1 mvneclipse 0m 15s branch-2.7 passed
          +1 findbugs 2m 50s branch-2.7 passed
          +1 javadoc 1m 0s branch-2.7 passed with JDK v1.8.0_111
          +1 javadoc 1m 50s branch-2.7 passed with JDK v1.7.0_111
          +1 mvninstall 0m 52s the patch passed
          +1 compile 0m 56s the patch passed with JDK v1.8.0_111
          +1 javac 0m 56s the patch passed
          +1 compile 1m 1s the patch passed with JDK v1.7.0_111
          +1 javac 1m 1s the patch passed
          -0 checkstyle 0m 25s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 130 unchanged - 2 fixed = 131 total (was 132)
          +1 mvnsite 1m 0s the patch passed
          +1 mvneclipse 0m 12s the patch passed
          -1 whitespace 0m 0s The patch has 2630 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
          -1 whitespace 0m 53s The patch 139 line(s) with tabs.
          +1 findbugs 3m 12s the patch passed
          +1 javadoc 0m 58s the patch passed with JDK v1.8.0_111
          +1 javadoc 1m 44s the patch passed with JDK v1.7.0_111
          -1 unit 45m 35s hadoop-hdfs in the patch failed with JDK v1.7.0_111.
          -1 asflicense 0m 20s The patch generated 3 ASF License warnings.
          122m 49s



          Reason Tests
          JDK v1.8.0_111 Failed junit tests hadoop.hdfs.server.namenode.ha.TestDNFencing
            hadoop.hdfs.server.balancer.TestBalancer
            hadoop.hdfs.web.TestHttpsFileSystem
            hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
          JDK v1.7.0_111 Failed junit tests hadoop.hdfs.TestDatanodeRegistration
            hadoop.hdfs.TestLeaseRecovery2
            hadoop.hdfs.web.TestHttpsFileSystem
            hadoop.hdfs.server.namenode.TestDecommissioningStatus
            hadoop.hdfs.server.datanode.TestBlockScanner
            hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots



          Subsystem Report/Notes
          Docker Image:yetus/hadoop:c420dfe
          JIRA Issue HDFS-11056
          JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838400/HDFS-11056.branch-2.7.patch
          Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
          uname Linux e818a333bd08 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
          Build tool maven
          Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
          git revision branch-2.7 / d762730
          Default Java 1.7.0_111
          Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111
          findbugs v3.0.0
          checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/whitespace-eol.txt
          whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/whitespace-tabs.txt
          unit https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_111.txt
          JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17508/testReport/
          asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/patch-asflicense-problems.txt
          modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
          Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17508/console
          Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org

          This message was automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 13s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 2 new or modified test files. +1 mvninstall 5m 44s branch-2.7 passed +1 compile 0m 56s branch-2.7 passed with JDK v1.8.0_111 +1 compile 1m 1s branch-2.7 passed with JDK v1.7.0_111 +1 checkstyle 0m 24s branch-2.7 passed +1 mvnsite 0m 56s branch-2.7 passed +1 mvneclipse 0m 15s branch-2.7 passed +1 findbugs 2m 50s branch-2.7 passed +1 javadoc 1m 0s branch-2.7 passed with JDK v1.8.0_111 +1 javadoc 1m 50s branch-2.7 passed with JDK v1.7.0_111 +1 mvninstall 0m 52s the patch passed +1 compile 0m 56s the patch passed with JDK v1.8.0_111 +1 javac 0m 56s the patch passed +1 compile 1m 1s the patch passed with JDK v1.7.0_111 +1 javac 1m 1s the patch passed -0 checkstyle 0m 25s hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 130 unchanged - 2 fixed = 131 total (was 132) +1 mvnsite 1m 0s the patch passed +1 mvneclipse 0m 12s the patch passed -1 whitespace 0m 0s The patch has 2630 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply -1 whitespace 0m 53s The patch 139 line(s) with tabs. +1 findbugs 3m 12s the patch passed +1 javadoc 0m 58s the patch passed with JDK v1.8.0_111 +1 javadoc 1m 44s the patch passed with JDK v1.7.0_111 -1 unit 45m 35s hadoop-hdfs in the patch failed with JDK v1.7.0_111. -1 asflicense 0m 20s The patch generated 3 ASF License warnings. 122m 49s Reason Tests JDK v1.8.0_111 Failed junit tests hadoop.hdfs.server.namenode.ha.TestDNFencing   hadoop.hdfs.server.balancer.TestBalancer   hadoop.hdfs.web.TestHttpsFileSystem   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots JDK v1.7.0_111 Failed junit tests hadoop.hdfs.TestDatanodeRegistration   hadoop.hdfs.TestLeaseRecovery2   hadoop.hdfs.web.TestHttpsFileSystem   hadoop.hdfs.server.namenode.TestDecommissioningStatus   hadoop.hdfs.server.datanode.TestBlockScanner   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots Subsystem Report/Notes Docker Image:yetus/hadoop:c420dfe JIRA Issue HDFS-11056 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12838400/HDFS-11056.branch-2.7.patch Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux e818a333bd08 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision branch-2.7 / d762730 Default Java 1.7.0_111 Multi-JDK versions /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_111 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/whitespace-eol.txt whitespace https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/whitespace-tabs.txt unit https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_111.txt JDK v1.7.0_111 Test Results https://builds.apache.org/job/PreCommit-HDFS-Build/17508/testReport/ asflicense https://builds.apache.org/job/PreCommit-HDFS-Build/17508/artifact/patchprocess/patch-asflicense-problems.txt modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs Console output https://builds.apache.org/job/PreCommit-HDFS-Build/17508/console Powered by Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org This message was automatically generated.
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          The warnings and test errors looks unrelated.

          Show
          jojochuang Wei-Chiu Chuang added a comment - The warnings and test errors looks unrelated.
          Hide
          kihwal Kihwal Lee added a comment -

          +1 for the 2.7 patch. It looks to be a correct port. Thanks Wei-Chiu Chuang.

          Show
          kihwal Kihwal Lee added a comment - +1 for the 2.7 patch. It looks to be a correct port. Thanks Wei-Chiu Chuang .
          Hide
          jojochuang Wei-Chiu Chuang added a comment -

          I ran the branch-2.7 patch against my local Yetus and did not see the same warning as reported here. It seems to be an issue with branch-2.7 cherrypick as I recalled a similar issue occurred last time I made another branch-2.7 patch.
          I committed the patch to branch-2.7, branch-2.8, branch-2 and trunk. Thanks Kihwal Lee for the review and +1, and Lei (Eddy) Xu for the comment!

          Show
          jojochuang Wei-Chiu Chuang added a comment - I ran the branch-2.7 patch against my local Yetus and did not see the same warning as reported here. It seems to be an issue with branch-2.7 cherrypick as I recalled a similar issue occurred last time I made another branch-2.7 patch. I committed the patch to branch-2.7, branch-2.8, branch-2 and trunk. Thanks Kihwal Lee for the review and +1, and Lei (Eddy) Xu for the comment!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10979 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10979/)
          HDFS-11229. HDFS-11056 failed to close meta file. Contributed by (weichiu: rev 2a28e8cf0469a373a99011f0fa540474e60528c8)

          • (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10979 (See https://builds.apache.org/job/Hadoop-trunk-Commit/10979/ ) HDFS-11229 . HDFS-11056 failed to close meta file. Contributed by (weichiu: rev 2a28e8cf0469a373a99011f0fa540474e60528c8) (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java

            People

            • Assignee:
              jojochuang Wei-Chiu Chuang
              Reporter:
              jojochuang Wei-Chiu Chuang
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development