[HDFS-1057] Concurrent readers hit ChecksumExceptions if following a writer to very end of file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.20-append, 0.21.0, 0.22.0
Fix Version/s: 0.20-append, 0.20.205.0, 0.21.0, 0.22.0
Component/s: datanode
Labels:
None

Hadoop Flags:

Reviewed

Description

In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

conurrent-reader-patch-1.txt
29/Apr/10 19:15
30 kB
sam rash
conurrent-reader-patch-2.txt
01/May/10 04:53
31 kB
sam rash
conurrent-reader-patch-3.txt
03/May/10 23:43
34 kB
sam rash
hdfs-1057-trunk-1.txt
29/May/10 01:27
27 kB
sam rash
hdfs-1057-trunk-2.txt
06/Jun/10 21:21
26 kB
sam rash
hdfs-1057-trunk-3.txt
07/Jun/10 21:11
25 kB
sam rash
hdfs-1057-trunk-4.txt
22/Jun/10 23:59
29 kB
sam rash
HDFS-1057-0.20-append.patch
24/Jun/10 20:17
35 kB
Nicolas Spiegelberg
hdfs-1057-trunk-5.txt
25/Jun/10 18:44
30 kB
sam rash
hdfs-1057-trunk-6.txt
29/Jun/10 16:39
29 kB
sam rash
HDFS-1057.20-security.1.patch
02/Sep/11 21:31
34 kB
Jitendra Nath Pandey

Issue Links

is related to

HDFS-1401 TestFileConcurrentReader test case is still timing out / failing

Resolved

HDFS-1103 Replica recovery doesn't distinguish between flushed-but-corrupted last chunk and unflushed last chunk

Resolved

HDFS-1679 TestFileConcurrentReader fails intermittently

Resolved

HDFS-1885 Recurring failures in TestFileConcurrentReader for > 12 days

Resolved

HDFS-1310 TestFileConcurrentReader fails

Closed

relates to

HDFS-3719 Re-enable append-related tests in TestFileConcurrentReader

Reopened

HADOOP-7146 RPC server leaks file descriptors

Closed

(2 relates to)

Activity

People

Assignee:: sam rash

Reporter:: Todd Lipcon

Votes:: 1 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 22/Mar/10 00:11

Updated:: 14/Aug/12 00:59

Resolved:: 01/Jul/10 08:38