[HDFS-13709] Report bad block to NN when transfer block encounter EIO exception - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.3.0, 3.2.1, 3.1.3
Component/s: datanode
Labels:
None

Description

In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes disk bad track may cause data loss.

For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs on A's replica data, and someday B and C crushed at the same time, NN will try to replicate data from A but failed, this block is corrupt now but no one knows, because NN think there is at least 1 healthy replica and it keep trying to replicate it.

When reading a replica which have data on bad track, OS will return an EIO error, if DN reports the bad block as soon as it got an EIO, we can find this case ASAP and try to avoid data loss

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-13709.patch
29/Jun/18 10:50
10 kB
Chen Zhang
HDFS-13709.005.patch
19/Aug/19 12:33
15 kB
Chen Zhang
HDFS-13709.004.patch
15/Aug/19 02:55
14 kB
Chen Zhang
HDFS-13709.003.patch
14/Aug/19 18:03
12 kB
Chen Zhang
HDFS-13709.002.patch
12/Aug/19 07:09
12 kB
Chen Zhang

Issue Links

relates to

HDFS-14752 backport HDFS-13709 to branch-2(Report bad block to NN when transfer block encounter EIO exception)

Patch Available

Activity

People

Assignee:: Chen Zhang

Reporter:: Chen Zhang

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 29/Jun/18 08:55

Updated:: 02/Oct/19 17:15

Resolved:: 19/Aug/19 20:15