Description
Looks like ChecksumFileSystem fails to read a file when bytesPerChecksum is larger than io.file.buffer.size. Default for bytesPerChecksum and buffer size are 512 and 4096, so default config might not see the problem.
I noticed this problem when I was testing block level CRCs with different configs.
How to reproduce with latest trunk:
Copy a text file larger than 512 bytes to dfs : bin/hadoop fs -copyFromLocal ~/tmp/x.txt x.txt
then set io.file.buffer.size to something smaller than 512 (say 53). Now try to read the file :
bin/hadoop dfs -cat x.txt
This will print only the first 53 characters.
The following code or comment at ChecksumFileSystem.java:163 seems suspect. But not sure if more changes are required:
public int read(byte b[], int off, int len) throws IOException { // make sure that it ends at a checksum boundary long curPos = getPos(); long endPos = len+curPos/bytesPerSum*bytesPerSum; return readBuffer(b, off, (int)(endPos-curPos)); }
Attachments
Issue Links
- is part of
-
HADOOP-1470 Rework FSInputChecker and FSOutputSummer to support checksum code sharing between ChecksumFileSystem and block level crc dfs
- Closed