Hadoop Common
  1. Hadoop Common
  2. HADOOP-7623

S3FileSystem reports block-size as length of File if file length is less than a block

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.20.1
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
    • Tags:
      s3

      Description

      in S3FileSystem create a File with Block size as 67108864.
      Write some data in file of size 2048 (less than 67108864)
      Assert the block size of the file. the block size reported will be 2048 rather than 67108864.

      This behavior is not inline with HDFS.

      1. HADOOP-7623.patch
        9 kB
        Subroto Sanyal

        Activity

        Hide
        Subroto Sanyal added a comment -

        Proposed solution would be serialize the blockSize with INode in S3.
        On the same lines while reading we can de-serialize accordingly.

        But, this solution leads to backward compatibility problem.

        Show
        Subroto Sanyal added a comment - Proposed solution would be serialize the blockSize with INode in S3. On the same lines while reading we can de-serialize accordingly. But, this solution leads to backward compatibility problem.
        Hide
        Uma Maheswara Rao G added a comment -

        Looks S3 is getting the size directly from actual size of first block.

        private static long findBlocksize(INode inode) {
              final Block[] ret = inode.getBlocks();
              return ret == null ? 0L : ret[0].getLength();
            }
        

        I too also think to serialize block size into INode as DFS. Lets see is there any specific reason do like this for S3 alone. Can someone clarify who involved initially in S3FileSystem implementations?

        Here the problem is till the firstblock completes, we will not know exact blocksize if i am not wrong. Is this your issue?

        Thanks
        Uma

        Show
        Uma Maheswara Rao G added a comment - Looks S3 is getting the size directly from actual size of first block. private static long findBlocksize(INode inode) { final Block[] ret = inode.getBlocks(); return ret == null ? 0L : ret[0].getLength(); } I too also think to serialize block size into INode as DFS. Lets see is there any specific reason do like this for S3 alone. Can someone clarify who involved initially in S3FileSystem implementations? Here the problem is till the firstblock completes, we will not know exact blocksize if i am not wrong. Is this your issue? Thanks Uma
        Hide
        Subroto Sanyal added a comment -

        Attaching the patch for issue with corresponding source code and test code modifications.

        The change doesn't breaks compatibility. The exisiting files can be read in same manner.
        The patch adds one more field in the INode - blockSize.
        Same is serialized and de-serialized as required.

        Show
        Subroto Sanyal added a comment - Attaching the patch for issue with corresponding source code and test code modifications. The change doesn't breaks compatibility. The exisiting files can be read in same manner. The patch adds one more field in the INode - blockSize . Same is serialized and de-serialized as required.
        Hide
        Subroto Sanyal added a comment -

        The fix is provided on trunk....

        Show
        Subroto Sanyal added a comment - The fix is provided on trunk....
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12495024/HADOOP-7623.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 8 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/200//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/200//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12495024/HADOOP-7623.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/200//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/200//console This message is automatically generated.

          People

          • Assignee:
            Unassigned
            Reporter:
            Subroto Sanyal
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development