Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
CDH5.7.4
-
Reviewed
-
Fixed a race condition that caused VolumeScanner to recognize a good replica as a bad one if the replica is also being written concurrently.
Description
Due to a race condition initially reported in HDFS-6804, VolumeScanner may erroneously detect good replicas as corrupt. This is serious because in some cases it results in data loss if all replicas are declared corrupt. This bug is especially prominent when there are a lot of append requests via HttpFs/WebHDFS.
We are investigating an incidence that caused very high block corruption rate in a relatively small cluster. Initially, we thought HDFS-11056 is to blame. However, after applying HDFS-11056, we are still seeing VolumeScanner reporting corrupt replicas.
It turns out that if a replica is being appended while VolumeScanner is scanning it, VolumeScanner may use the new checksum to compare against old data, causing checksum mismatch.
I have a unit test to reproduce the error. Will attach later. A quick and simple fix is to hold FsDatasetImpl lock and read from disk the checksum.
Attachments
Attachments
- HDFS-11160.reproduce.patch
- 13 kB
- Wei-Chiu Chuang
- HDFS-11160.branch-2.patch
- 21 kB
- Wei-Chiu Chuang
- HDFS-11160.008.patch
- 17 kB
- Wei-Chiu Chuang
- HDFS-11160.007.patch
- 17 kB
- Wei-Chiu Chuang
- HDFS-11160.006.patch
- 18 kB
- Wei-Chiu Chuang
- HDFS-11160.005.patch
- 18 kB
- Wei-Chiu Chuang
- HDFS-11160.004.patch
- 18 kB
- Wei-Chiu Chuang
- HDFS-11160.003.patch
- 17 kB
- Yongjun Zhang
- HDFS-11160.002.patch
- 11 kB
- Wei-Chiu Chuang
- HDFS-11160.001.patch
- 10 kB
- Wei-Chiu Chuang
Issue Links
- breaks
-
HDFS-12136 BlockSender performance regression due to volume scanner edge case
- Resolved
- depends upon
-
HDFS-11229 HDFS-11056 failed to close meta file
- Resolved
- is depended upon by
-
HDFS-11187 Optimize disk access for last partial chunk checksum of Finalized replica
- Resolved
- is duplicated by
-
HDFS-6804 Add test for race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"
- Resolved
- is related to
-
HDFS-6804 Add test for race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"
- Resolved
-
HDFS-11354 TestBlockScanner#testAppendWhileScanning should shutdown the MiniDFSCluster
- Patch Available
- relates to
-
HDFS-11022 DataNode unable to remove corrupt block replica due to race condition
- Open
-
HDFS-11229 HDFS-11056 failed to close meta file
- Resolved
Activity
HDFS-11022 is the aftermath of this bug. VolumeScanner detects corruption incorrectly, and then it reports the older replica gen stamp to NameNode (the replica is updated while VolumeScanner is scanning).
Attach my simple fix in v001 patch.
In v001 fix, BlockSender constructor pre-loads last partial checksum from on-disk replica if it is a finalized replica. This is simpler than adding a new field in FinalizedReplica class and maintain the value of the field throughout the lifetime of the replica, at the cost of potentially more disk access (because each BlockSender instantiation needs to reload checksum again, regardless whether the replica is updated or not. In addition, the checksum is read while holding FsDatasetImpl lock). I verified the unit test passed with this simple fix, and fails without the fix.
Appreciate any comments!
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 11s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 1 new or modified test files. |
+1 | mvninstall | 8m 32s | trunk passed |
+1 | compile | 0m 55s | trunk passed |
+1 | checkstyle | 0m 35s | trunk passed |
+1 | mvnsite | 1m 12s | trunk passed |
+1 | mvneclipse | 0m 17s | trunk passed |
+1 | findbugs | 2m 0s | trunk passed |
+1 | javadoc | 0m 44s | trunk passed |
+1 | mvninstall | 0m 54s | the patch passed |
+1 | compile | 0m 53s | the patch passed |
+1 | javac | 0m 53s | the patch passed |
-0 | checkstyle | 0m 26s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 11 new + 105 unchanged - 1 fixed = 116 total (was 106) |
+1 | mvnsite | 0m 57s | the patch passed |
+1 | mvneclipse | 0m 12s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 2m 4s | the patch passed |
+1 | javadoc | 0m 43s | the patch passed |
-1 | unit | 80m 15s | hadoop-hdfs in the patch failed. |
-1 | asflicense | 0m 20s | The patch generated 2 ASF License warnings. |
102m 32s |
Reason | Tests |
---|---|
Failed junit tests | hadoop.hdfs.TestPread |
hadoop.hdfs.TestSetrepIncreasing | |
hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup | |
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | |
hadoop.hdfs.server.datanode.TestReadOnlySharedStorage | |
hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer | |
hadoop.hdfs.server.balancer.TestBalancer | |
hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer | |
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS | |
hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | |
hadoop.hdfs.TestHDFSFileSystemContract | |
hadoop.hdfs.TestSmallBlock | |
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | |
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | |
hadoop.hdfs.TestDFSStripedInputStream | |
Timed out junit tests | org.apache.hadoop.hdfs.TestReplication |
org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:a9ad5d6 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12839943/HDFS-11160.001.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux 87d2b0966d24 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | trunk / 6f80742 |
Default Java | 1.8.0_111 |
findbugs | v3.0.0 |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17627/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17627/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17627/testReport/ |
asflicense | https://builds.apache.org/job/PreCommit-HDFS-Build/17627/artifact/patchprocess/patch-asflicense-problems.txt |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17627/console |
Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
the test failures are related to the change in v001 patch. I'm working to fix these failures.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 19s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 1 new or modified test files. |
+1 | mvninstall | 9m 13s | trunk passed |
+1 | compile | 0m 54s | trunk passed |
+1 | checkstyle | 0m 29s | trunk passed |
+1 | mvnsite | 1m 0s | trunk passed |
+1 | mvneclipse | 0m 16s | trunk passed |
+1 | findbugs | 2m 2s | trunk passed |
+1 | javadoc | 0m 40s | trunk passed |
+1 | mvninstall | 0m 57s | the patch passed |
+1 | compile | 0m 46s | the patch passed |
+1 | javac | 0m 46s | the patch passed |
-0 | checkstyle | 0m 27s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 105 unchanged - 1 fixed = 106 total (was 106) |
+1 | mvnsite | 0m 57s | the patch passed |
+1 | mvneclipse | 0m 11s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 2m 3s | the patch passed |
+1 | javadoc | 0m 41s | the patch passed |
-1 | unit | 130m 50s | hadoop-hdfs in the patch failed. |
+1 | asflicense | 0m 39s | The patch does not generate ASF License warnings. |
153m 50s |
Reason | Tests |
---|---|
Failed junit tests | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | |
hadoop.fs.viewfs.TestViewFsHdfs | |
hadoop.fs.TestSymlinkHdfsFileContext | |
hadoop.hdfs.server.datanode.TestDirectoryScanner | |
Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
org.apache.hadoop.hdfs.TestReplication |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:a9ad5d6 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12840065/HDFS-11160.002.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux c73454c14209 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | trunk / afcf8d3 |
Default Java | 1.8.0_111 |
findbugs | v3.0.0 |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17634/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17634/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17634/testReport/ |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17634/console |
Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 12s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 1 new or modified test files. |
+1 | mvninstall | 7m 3s | trunk passed |
+1 | compile | 0m 47s | trunk passed |
+1 | checkstyle | 0m 27s | trunk passed |
+1 | mvnsite | 0m 52s | trunk passed |
+1 | mvneclipse | 0m 13s | trunk passed |
+1 | findbugs | 1m 42s | trunk passed |
+1 | javadoc | 0m 41s | trunk passed |
+1 | mvninstall | 0m 48s | the patch passed |
+1 | compile | 0m 45s | the patch passed |
+1 | javac | 0m 45s | the patch passed |
-0 | checkstyle | 0m 27s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 105 unchanged - 1 fixed = 106 total (was 106) |
+1 | mvnsite | 0m 57s | the patch passed |
+1 | mvneclipse | 0m 10s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 1m 51s | the patch passed |
+1 | javadoc | 0m 39s | the patch passed |
-1 | unit | 69m 6s | hadoop-hdfs in the patch failed. |
+1 | asflicense | 0m 19s | The patch does not generate ASF License warnings. |
88m 15s |
Reason | Tests |
---|---|
Failed junit tests | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
Timed out junit tests | org.apache.hadoop.hdfs.TestReplication |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:a9ad5d6 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12840065/HDFS-11160.002.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux 8c6814114519 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | trunk / 83cc726 |
Default Java | 1.8.0_111 |
findbugs | v3.0.0 |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17637/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17637/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17637/testReport/ |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17637/console |
Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
I have been also trying to optimize checksum calculation. The basic idea is to copy the in-memory last partial chunk checksum in RBW replica when converting an RBW to Finalized, so that there's no need to recalculate checksum every time BlockSender reads a Finalized block. However, maintaining update-to-date last partial chunk checksum in-memory is pretty complex, due to operations such as truncate, as well as use cases such as HSM.
May I suggest that we get the current patch reviewed and committed, because it's quite critical, and then work on optimization later? Thanks.
Hi weichiu,
Thanks for your work here. I did a review of your patch here.
While the optimization discussion is still ongoing, I focused on the implementation. I think it's not good to let BlockSender be aware of FsVolumeImpl, because it seems an abstraction violation here.
I changed the implementation to address this and uploaded patch rev 003. Basically I think we can have a similar API in FinalizedReplica as in RBW replica to get the last partial checksum.
A possible optimization is not to do this when the visibleLength is at chunk boundary (I have not added this change).
I did not go through the test code yet.
Please take a look at what I changed, hope it makes sense to you.
Thanks.
About your optimization:
The basic idea is to copy the in-memory last partial chunk checksum in RBW replica when converting an RBW to Finalized,
I think for the bug reported here, what happened is
FinalizedReplica (S0) --> RBW (Append, S1) --> FinalizedReplica (S2)
The BlockSender constructor happens at S0, then an append happens, and go through S1, S2, at S2, it updated the partial checksum on disk. Then BlockSender starts reading the data and transfer data, and got an matching checksum.
So I think your above optimization doesn't help this jira.
What do you think?
Thanks.
Thanks Yongjun. Much appreciate your help here. I looked at v003 patch quickly and looks reasonable to me. Agree with you, there's no need to load last chunk checksum if the last chunk is full.
Hi Yongjun. Thanks for pointing out. I should have made myself more clear. In your example, say FinalizedReplica S0 is converted from an Rbw S(-1), I meant to let FinalizedReplica (S0) preserve the in-memory LPCC from RBW at S(-1).
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 13s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
+1 | mvninstall | 9m 50s | trunk passed |
+1 | compile | 0m 59s | trunk passed |
+1 | checkstyle | 0m 36s | trunk passed |
+1 | mvnsite | 1m 5s | trunk passed |
+1 | mvneclipse | 0m 15s | trunk passed |
-1 | findbugs | 1m 55s | hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. |
+1 | javadoc | 0m 48s | trunk passed |
+1 | mvninstall | 0m 57s | the patch passed |
+1 | compile | 1m 0s | the patch passed |
+1 | javac | 1m 0s | the patch passed |
-0 | checkstyle | 0m 29s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 186 unchanged - 1 fixed = 189 total (was 187) |
+1 | mvnsite | 0m 59s | the patch passed |
+1 | mvneclipse | 0m 12s | the patch passed |
-1 | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply |
+1 | findbugs | 1m 54s | the patch passed |
+1 | javadoc | 0m 44s | the patch passed |
-1 | unit | 66m 40s | hadoop-hdfs in the patch failed. |
+1 | asflicense | 0m 21s | The patch does not generate ASF License warnings. |
90m 11s |
Reason | Tests |
---|---|
Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
Timed out junit tests | org.apache.hadoop.hdfs.TestReplication |
This message was automatically generated.
The checkstyle warning is unrelated.
The findbug warning is likely a false positive.
yzhangal thanks a lot for your review and the new patch. Is there anything I can do to push this further?
kihwal we are hitting this issue repeatedly in a specific scenario and would really love to see this bug fixed.
Thanks a lot!
+1 The patch looks good. About
// TODO: we only need to do this if the visibleLength is not // at chunk boundary
Frequent appending will likely leave the end of the block unaligned. So I think this optimization is not worth adding.
Thanks kihwal for the review! I am posting branch-2 patch for precommit check.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 20s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
+1 | mvninstall | 7m 18s | branch-2 passed |
+1 | compile | 0m 48s | branch-2 passed with JDK v1.8.0_111 |
+1 | compile | 0m 46s | branch-2 passed with JDK v1.7.0_121 |
+1 | checkstyle | 0m 31s | branch-2 passed |
+1 | mvnsite | 0m 53s | branch-2 passed |
+1 | mvneclipse | 0m 18s | branch-2 passed |
+1 | findbugs | 2m 0s | branch-2 passed |
+1 | javadoc | 1m 2s | branch-2 passed with JDK v1.8.0_111 |
+1 | javadoc | 1m 39s | branch-2 passed with JDK v1.7.0_121 |
+1 | mvninstall | 0m 46s | the patch passed |
+1 | compile | 0m 47s | the patch passed with JDK v1.8.0_111 |
+1 | javac | 0m 47s | the patch passed |
+1 | compile | 0m 44s | the patch passed with JDK v1.7.0_121 |
+1 | javac | 0m 44s | the patch passed |
-0 | checkstyle | 0m 26s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 191 unchanged - 1 fixed = 193 total (was 192) |
+1 | mvnsite | 0m 53s | the patch passed |
+1 | mvneclipse | 0m 13s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 2m 17s | the patch passed |
+1 | javadoc | 0m 58s | the patch passed with JDK v1.8.0_111 |
+1 | javadoc | 1m 34s | the patch passed with JDK v1.7.0_121 |
-1 | unit | 74m 21s | hadoop-hdfs in the patch failed with JDK v1.7.0_121. |
+1 | asflicense | 0m 26s | The patch does not generate ASF License warnings. |
174m 23s |
Reason | Tests |
---|---|
JDK v1.8.0_111 Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
JDK v1.8.0_111 Timed out junit tests | org.apache.hadoop.hdfs.TestReplication |
JDK v1.7.0_121 Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
JDK v1.7.0_121 Timed out junit tests | org.apache.hadoop.hdfs.TestReplication |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:b59b8b7 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12842372/HDFS-11160.branch-2.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux 48b8013e50fa 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | branch-2 / c73d839 |
Default Java | 1.7.0_121 |
Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 |
findbugs | v3.0.0 |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17800/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17800/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_121.txt |
JDK v1.7.0_121 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17800/testReport/ |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17800/console |
Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 14m 42s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
+1 | mvninstall | 7m 10s | branch-2 passed |
+1 | compile | 0m 43s | branch-2 passed with JDK v1.8.0_111 |
+1 | compile | 0m 46s | branch-2 passed with JDK v1.7.0_121 |
+1 | checkstyle | 0m 32s | branch-2 passed |
+1 | mvnsite | 0m 54s | branch-2 passed |
+1 | mvneclipse | 0m 17s | branch-2 passed |
+1 | findbugs | 2m 5s | branch-2 passed |
+1 | javadoc | 0m 59s | branch-2 passed with JDK v1.8.0_111 |
+1 | javadoc | 1m 41s | branch-2 passed with JDK v1.7.0_121 |
+1 | mvninstall | 0m 47s | the patch passed |
+1 | compile | 0m 39s | the patch passed with JDK v1.8.0_111 |
+1 | javac | 0m 39s | the patch passed |
+1 | compile | 0m 43s | the patch passed with JDK v1.7.0_121 |
+1 | javac | 0m 43s | the patch passed |
-0 | checkstyle | 0m 35s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 191 unchanged - 1 fixed = 193 total (was 192) |
+1 | mvnsite | 0m 54s | the patch passed |
+1 | mvneclipse | 0m 13s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 2m 18s | the patch passed |
+1 | javadoc | 0m 56s | the patch passed with JDK v1.8.0_111 |
+1 | javadoc | 1m 36s | the patch passed with JDK v1.7.0_121 |
-1 | unit | 60m 41s | hadoop-hdfs in the patch failed with JDK v1.7.0_121. |
+1 | asflicense | 0m 21s | The patch does not generate ASF License warnings. |
164m 16s |
Reason | Tests |
---|---|
JDK v1.8.0_111 Timed out junit tests | org.apache.hadoop.hdfs.TestReplication |
JDK v1.7.0_121 Failed junit tests | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
hadoop.hdfs.TestDFSClientRetries | |
JDK v1.7.0_121 Timed out junit tests | org.apache.hadoop.hdfs.TestReplication |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:b59b8b7 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12842372/HDFS-11160.branch-2.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux 529b1373e565 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | branch-2 / e51f32f |
Default Java | 1.7.0_121 |
Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 |
findbugs | v3.0.0 |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17803/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17803/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_121.txt |
JDK v1.7.0_121 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17803/testReport/ |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17803/console |
Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
The timeout in TestReplication seems related to this branch-2 patch. I am taking a look...
The timeout in TimeReplication was related to this patch (both trunk/branch-2). It intentionally truncated and extended the raw block file size. Updated the patch to make DataNode handle this error better.
Also, I caught one potential bug in the code (actually, the bug was committed in HDFS-11056 by myself) where DN would read metafile without closing it.
FYI the meta file not close bug is pretty serious, and I filed a new jira to fix it. HDFS-11229
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 25s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
+1 | mvninstall | 8m 25s | trunk passed |
+1 | compile | 0m 55s | trunk passed |
+1 | checkstyle | 0m 30s | trunk passed |
+1 | mvnsite | 1m 2s | trunk passed |
+1 | mvneclipse | 0m 14s | trunk passed |
+1 | findbugs | 1m 54s | trunk passed |
+1 | javadoc | 0m 45s | trunk passed |
+1 | mvninstall | 0m 48s | the patch passed |
+1 | compile | 0m 45s | the patch passed |
+1 | javac | 0m 45s | the patch passed |
-0 | checkstyle | 0m 26s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 187 unchanged - 1 fixed = 189 total (was 188) |
+1 | mvnsite | 0m 50s | the patch passed |
+1 | mvneclipse | 0m 11s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 1m 53s | the patch passed |
+1 | javadoc | 0m 40s | the patch passed |
-1 | unit | 106m 28s | hadoop-hdfs in the patch failed. |
+1 | asflicense | 0m 21s | The patch does not generate ASF License warnings. |
127m 50s |
Reason | Tests |
---|---|
Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:a9ad5d6 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12842603/HDFS-11160.004.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux 317154908d23 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | trunk / 80b8023 |
Default Java | 1.8.0_111 |
findbugs | v3.0.0 |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17810/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17810/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17810/testReport/ |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17810/console |
Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 18s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
+1 | mvninstall | 7m 9s | trunk passed |
+1 | compile | 0m 52s | trunk passed |
+1 | checkstyle | 0m 32s | trunk passed |
+1 | mvnsite | 0m 59s | trunk passed |
+1 | mvneclipse | 0m 15s | trunk passed |
+1 | findbugs | 1m 46s | trunk passed |
+1 | javadoc | 0m 43s | trunk passed |
+1 | mvninstall | 0m 49s | the patch passed |
+1 | compile | 0m 48s | the patch passed |
+1 | javac | 0m 48s | the patch passed |
-0 | checkstyle | 0m 27s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 187 unchanged - 1 fixed = 189 total (was 188) |
+1 | mvnsite | 0m 59s | the patch passed |
+1 | mvneclipse | 0m 10s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 1m 54s | the patch passed |
+1 | javadoc | 0m 38s | the patch passed |
-1 | unit | 95m 16s | hadoop-hdfs in the patch failed. |
+1 | asflicense | 0m 19s | The patch does not generate ASF License warnings. |
115m 15s |
Reason | Tests |
---|---|
Failed junit tests | hadoop.hdfs.TestSecureEncryptionZoneWithKMS |
hadoop.hdfs.TestTrashWithSecureEncryptionZones | |
hadoop.fs.viewfs.TestViewFsAtHdfsRoot |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:a9ad5d6 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12842675/HDFS-11160.005.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux e30e8e6bf209 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | trunk / 4c38f11 |
Default Java | 1.8.0_111 |
findbugs | v3.0.0 |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17820/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17820/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17820/testReport/ |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17820/console |
Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
v006 patch. throw ioexception, instead of returning null if it can't read checksum from meta file.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 15s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
+1 | mvninstall | 6m 51s | trunk passed |
+1 | compile | 0m 45s | trunk passed |
+1 | checkstyle | 0m 28s | trunk passed |
+1 | mvnsite | 0m 52s | trunk passed |
+1 | mvneclipse | 0m 13s | trunk passed |
+1 | findbugs | 1m 40s | trunk passed |
+1 | javadoc | 0m 39s | trunk passed |
+1 | mvninstall | 0m 44s | the patch passed |
+1 | compile | 0m 42s | the patch passed |
+1 | javac | 0m 42s | the patch passed |
-0 | checkstyle | 0m 25s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 187 unchanged - 1 fixed = 189 total (was 188) |
+1 | mvnsite | 0m 48s | the patch passed |
+1 | mvneclipse | 0m 10s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 1m 46s | the patch passed |
+1 | javadoc | 0m 36s | the patch passed |
-1 | unit | 68m 58s | hadoop-hdfs in the patch failed. |
+1 | asflicense | 0m 20s | The patch does not generate ASF License warnings. |
87m 25s |
Reason | Tests |
---|---|
Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
hadoop.hdfs.TestSecureEncryptionZoneWithKMS | |
hadoop.hdfs.TestTrashWithSecureEncryptionZones |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:a9ad5d6 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12842819/HDFS-11160.006.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux 3c4530319c92 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | trunk / f66f618 |
Default Java | 1.8.0_111 |
findbugs | v3.0.0 |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17840/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17840/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17840/testReport/ |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17840/console |
Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
I was pinged to review this as well. Change makes sense to me, verified test fails without the BlockSender change.
Test appears to have some duplicate code (wait for scan, and verify info), +1 after that's cleaned up.
Nice work here Wei-Chiu! Also thanks Yongjun and Kihwal for the reviews.
Thanks xiaochen Yes indeed looks like I can remove the redundancy in test code. Submit patch v007 for precommit check.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 0s | Docker mode activated. |
-1 | patch | 0m 8s | |
Subsystem | Report/Notes |
---|---|
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12843304/HDFS-11160.007.patch |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17859/console |
Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 23s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
+1 | mvninstall | 8m 57s | trunk passed |
+1 | compile | 0m 47s | trunk passed |
+1 | checkstyle | 0m 29s | trunk passed |
+1 | mvnsite | 0m 56s | trunk passed |
+1 | mvneclipse | 0m 13s | trunk passed |
+1 | findbugs | 2m 6s | trunk passed |
+1 | javadoc | 0m 45s | trunk passed |
+1 | mvninstall | 1m 0s | the patch passed |
+1 | compile | 0m 55s | the patch passed |
+1 | javac | 0m 55s | the patch passed |
-0 | checkstyle | 0m 29s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 186 unchanged - 1 fixed = 187 total (was 187) |
+1 | mvnsite | 1m 3s | the patch passed |
+1 | mvneclipse | 0m 12s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 2m 12s | the patch passed |
+1 | javadoc | 0m 41s | the patch passed |
-1 | unit | 96m 4s | hadoop-hdfs in the patch failed. |
+1 | asflicense | 0m 26s | The patch does not generate ASF License warnings. |
119m 5s |
Reason | Tests |
---|---|
Failed junit tests | hadoop.hdfs.TestPread |
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | |
hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | |
hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:a9ad5d6 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12843310/HDFS-11160.008.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux 7c2199ea304b 3.13.0-96-generic #143-Ubuntu SMP Mon Aug 29 20:15:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | trunk / 64a2d5b |
Default Java | 1.8.0_111 |
findbugs | v3.0.0 |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17860/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17860/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt |
Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17860/testReport/ |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17860/console |
Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
-1 overall |
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
0 | reexec | 0m 13s | Docker mode activated. |
+1 | @author | 0m 0s | The patch does not contain any @author tags. |
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
-1 | mvninstall | 2m 35s | root in branch-2 failed. |
+1 | compile | 0m 53s | branch-2 passed with JDK v1.8.0_111 |
+1 | compile | 0m 41s | branch-2 passed with JDK v1.7.0_121 |
+1 | checkstyle | 0m 29s | branch-2 passed |
+1 | mvnsite | 0m 51s | branch-2 passed |
+1 | mvneclipse | 0m 16s | branch-2 passed |
+1 | findbugs | 1m 57s | branch-2 passed |
+1 | javadoc | 0m 56s | branch-2 passed with JDK v1.8.0_111 |
+1 | javadoc | 1m 34s | branch-2 passed with JDK v1.7.0_121 |
-1 | mvninstall | 0m 42s | hadoop-hdfs in the patch failed. |
+1 | compile | 0m 41s | the patch passed with JDK v1.8.0_111 |
+1 | javac | 0m 41s | the patch passed |
+1 | compile | 0m 40s | the patch passed with JDK v1.7.0_121 |
+1 | javac | 0m 40s | the patch passed |
-0 | checkstyle | 0m 27s | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 301 unchanged - 1 fixed = 302 total (was 302) |
+1 | mvnsite | 0m 48s | the patch passed |
+1 | mvneclipse | 0m 13s | the patch passed |
+1 | whitespace | 0m 0s | The patch has no whitespace issues. |
+1 | findbugs | 2m 13s | the patch passed |
+1 | javadoc | 0m 52s | the patch passed with JDK v1.8.0_111 |
+1 | javadoc | 1m 43s | the patch passed with JDK v1.7.0_121 |
-1 | unit | 48m 2s | hadoop-hdfs in the patch failed with JDK v1.7.0_121. |
+1 | asflicense | 0m 20s | The patch does not generate ASF License warnings. |
119m 27s |
Reason | Tests |
---|---|
JDK v1.8.0_111 Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
JDK v1.7.0_121 Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
Subsystem | Report/Notes |
---|---|
Docker | Image:yetus/hadoop:b59b8b7 |
JIRA Issue | |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12843357/HDFS-11160.branch-2.patch |
Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle |
uname | Linux 1d7c570b8d8f 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | maven |
Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
git revision | branch-2 / 236dbe3 |
Default Java | 1.7.0_121 |
Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_111 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_121 |
mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/17862/artifact/patchprocess/branch-mvninstall-root.txt |
findbugs | v3.0.0 |
mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/17862/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt |
checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/17862/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt |
unit | https://builds.apache.org/job/PreCommit-HDFS-Build/17862/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_121.txt |
JDK v1.7.0_121 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/17862/testReport/ |
modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs |
Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17862/console |
Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org |
This message was automatically generated.
The mvninstall failed due to HADOOP-13709 which used a Java 8 API. The failed test is not reproducible locally.
SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11006 (See https://builds.apache.org/job/Hadoop-trunk-Commit/11006/)
HDFS-11160. VolumeScanner reports write-in-progress replicas as corrupt (weichiu: rev aebb9127bae872835d057e1c6a6e6b3c6a8be6cd)
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FinalizedReplica.java
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockScanner.java
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/VolumeScanner.java
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalVolumeImpl.java
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
- (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockScanner.java
Committed the patch to branch-2.7, 2.8 , branch-2 and trunk.
Much thanks to kihwal yzhangal and xiaochen for multiple rounds of reviews!
From commit log, it indicate the commit only land in branch-2 but not branch-2.8. Replace fix version to 2.9 instead.
My bad. I had it cherry picked in my local tree, but didn't push up. Just pushed up my branch-2.8 commit now. Thanks a lot for reminder!
Our clusters are around ~2000 nodes and disks failures are quite common at this scale.
Based on the discussion on HDFS-12136 we are very concerned to run this patch in production as it puts I/O inside an exclusive lock.
Are there any possibilities to move the I/O out of the lock?
If it is not trivial to do, is it possible to defer this fix to 2.9 so that it is easier to get 2.8.2 out of the door? Since the bug has been around for a while we are okay to keep it as-is for a little bit longer.
What do you think?
Hi wheat9
an alternative approach is add a retry at client side, so that if client encounters a checksum error, it retries the read to eliminate the false positive due to the race condition.
I don't mind reverting it from 2.8 branch if it makes large Hadoop operators less concerned about the release.
Attach a reproduction test case.