Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1285

ChecksumFileSystem : Can't read when io.file.buffer.size < bytePerChecksum

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.12.3
    • 0.14.0
    • fs
    • None

    Description

      Looks like ChecksumFileSystem fails to read a file when bytesPerChecksum is larger than io.file.buffer.size. Default for bytesPerChecksum and buffer size are 512 and 4096, so default config might not see the problem.

      I noticed this problem when I was testing block level CRCs with different configs.

      How to reproduce with latest trunk:
      Copy a text file larger than 512 bytes to dfs : bin/hadoop fs -copyFromLocal ~/tmp/x.txt x.txt
      then set io.file.buffer.size to something smaller than 512 (say 53). Now try to read the file :

      bin/hadoop dfs -cat x.txt

      This will print only the first 53 characters.

      The following code or comment at ChecksumFileSystem.java:163 seems suspect. But not sure if more changes are required:

          public int read(byte b[], int off, int len) throws IOException {
            // make sure that it ends at a checksum boundary
            long curPos = getPos();
            long endPos = len+curPos/bytesPerSum*bytesPerSum;
            return readBuffer(b, off, (int)(endPos-curPos));
          }
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rangadi Raghu Angadi
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: