[HDFS-10178] Permanent write failures can happen if pipeline recoveries occur for the first packet - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
Component/s: None
Labels:
None

Target Version/s:

2.7.3, 2.6.5
Hadoop Flags:

Reviewed

Description

We have observed that write fails permanently if the first packet doesn't go through properly and pipeline recovery happens. If the write op creates a pipeline, but the actual data packet does not reach one or more datanodes in time, the pipeline recovery will be done against the 0-byte partial block.

If additional datanodes are added, the block is transferred to the new nodes. After the transfer, each node will have a meta file containing the header and 0-length data block file. The pipeline recovery seems to work correctly up to this point, but write fails when actual data packet is resent.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-10178.patch
17/Mar/16 22:06
8 kB
Kihwal Lee
HDFS-10178.v2.patch
18/Mar/16 14:08
8 kB
Kihwal Lee
HDFS-10178.v3.patch
18/Mar/16 18:32
9 kB
Kihwal Lee
HDFS-10178.v4.patch
01/Apr/16 21:04
6 kB
Kihwal Lee
HDFS-10178.v5.patch
04/Apr/16 18:02
5 kB
Kihwal Lee

Issue Links

is related to

HDFS-10490 Client may never recovery replica after a timeout during sending packet

Patch Available

Activity

People

Assignee:: Kihwal Lee

Reporter:: Kihwal Lee

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 17/Mar/16 21:48

Updated:: 06/Jan/17 00:48

Resolved:: 04/Apr/16 21:57