[HDFS-1059] completeFile loops forever if the block's only replica has become corrupt - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 0.21.0, 0.22.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

If a writer is appending to a block with replication factor 1, and that block has become corrupt, a reader will report the corruption to the NN. Then when the writer tries to complete the file, it will loop forever with an error like:

[junit] 2010-03-21 17:40:08,093 INFO namenode.FSNamesystem (FSNamesystem.java:checkFileProgress(1613)) - BLOCK* NameSystem.checkFileProgress: block blk_-4256412191814117589_1001

{blockUCState=COMMITTED, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[127.0.0.1:56782|RBW]]}

has not reached minimal replication 1
[junit] 2010-03-21 17:40:08,495 INFO hdfs.DFSClient (DFSOutputStream.java:completeFile(1435)) - Could not complete file /TestReadWhileWriting/file1 retrying...

Should add tests that cover the case of a writer appending to a block that is corrupt while a reader accesses it.

Attachments

Issue Links

is related to

HDFS-148 timeout when writing dfs file causes infinite loop when closing the file

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Todd Lipcon

Votes:: 1 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 22/Mar/10 01:43

Updated:: 30/Jul/14 16:57

Resolved:: 30/Jul/14 16:57