Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6641

[ HDFS- File Concat ] Concat will fail when target file is having one block which is not full

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: 2.4.1
    • Fix Version/s: None
    • Component/s: namenode
    • Labels:
      None

      Description

      sually we can't ensure lastblock alwaysfull...please let me know purpose of following check..

      long blockSize = trgInode.getPreferredBlockSize();

      // check the end block to be full
      final BlockInfo last = trgInode.getLastBlock();
      if(blockSize != last.getNumBytes())

      { throw new HadoopIllegalArgumentException("The last block in " + target + " is not full; last block size = " + last.getNumBytes() + " but file block size = " + blockSize); }

      If it is issue, I'll file jira.

      Following is the trace..

      exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.HadoopIllegalArgumentException): The last block in /Test.txt is not full; last block size = 14 but file block size = 134217728
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concatInternal(FSNamesystem.java:1887)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concatInt(FSNamesystem.java:1833)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concat(FSNamesystem.java:1795)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.concat(NameNodeRpcServer.java:704)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.concat(ClientNamenodeProtocolServerSideTranslatorPB.java:512)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

        Issue Links

          Activity

          Hide
          Chris Nauroth added a comment -

          Hello, Brahma Reddy Battula. There is some discussion of this topic on the original concat issue: HDFS-222. The concat destination file must still maintain the invariant that all blocks have the same length, except for possibly the last block, which may be partially filled. If this invariant were not maintained, then it could cause unpredictable behavior later when a client attempts to read that file.

          I'm resolving this issue as Not a Problem, because I believe this is all working as designed.

          Show
          Chris Nauroth added a comment - Hello, Brahma Reddy Battula . There is some discussion of this topic on the original concat issue: HDFS-222 . The concat destination file must still maintain the invariant that all blocks have the same length, except for possibly the last block, which may be partially filled. If this invariant were not maintained, then it could cause unpredictable behavior later when a client attempts to read that file. I'm resolving this issue as Not a Problem, because I believe this is all working as designed.
          Hide
          Brahma Reddy Battula added a comment -

          Hi Chris Nauroth

          The concat destination file must still maintain the invariant that all blocks have the same length, except for possibly the last block, which may be partially filled. If this invariant were not maintained, then it could cause unpredictable behavior later when a client attempts to read that file.
          I'm resolving this issue as Not a Problem, because I believe this is all working as designed.

          you mean,last block should be filled when we go for concat(like pre condition)..? I feel, this can addressed or else need to provide reason then we can close this jira..Please correct if i am wrong..

          Show
          Brahma Reddy Battula added a comment - Hi Chris Nauroth The concat destination file must still maintain the invariant that all blocks have the same length, except for possibly the last block, which may be partially filled. If this invariant were not maintained, then it could cause unpredictable behavior later when a client attempts to read that file. I'm resolving this issue as Not a Problem, because I believe this is all working as designed. you mean,last block should be filled when we go for concat(like pre condition)..? I feel, this can addressed or else need to provide reason then we can close this jira..Please correct if i am wrong..
          Hide
          Chris Nauroth added a comment -

          Yes, the pre-conditions of HDFS concat right now are:

          1. All source files must be in the same directory.
          2. Replication and block size must be the same for all source files.
          3. All blocks must be full in all source files except the last source file.
          4. In the last source file, all blocks must be full except the last block.
          Show
          Chris Nauroth added a comment - Yes, the pre-conditions of HDFS concat right now are: All source files must be in the same directory. Replication and block size must be the same for all source files. All blocks must be full in all source files except the last source file. In the last source file, all blocks must be full except the last block.
          Hide
          Brahma Reddy Battula added a comment -

          So what if source file having only block which is not full ( which is happened in my case hence I raised this jira )?

          Show
          Brahma Reddy Battula added a comment - So what if source file having only block which is not full ( which is happened in my case hence I raised this jira )?
          Hide
          Chris Nauroth added a comment -

          The concat method takes 2 arguments. The first is a single Path, and this is called the target. The second is an array of multiple Path instances, and these are called the sources. All of the sources will get appended to the target. The Path referred to as target must have all of its blocks full. Once again, this is intended to maintain the invariant that an HDFS file has a consistent block size, and all blocks are fully populated except the last one.

          Show
          Chris Nauroth added a comment - The concat method takes 2 arguments. The first is a single Path , and this is called the target. The second is an array of multiple Path instances, and these are called the sources. All of the sources will get appended to the target. The Path referred to as target must have all of its blocks full. Once again, this is intended to maintain the invariant that an HDFS file has a consistent block size, and all blocks are fully populated except the last one.

            People

            • Assignee:
              Unassigned
              Reporter:
              Brahma Reddy Battula
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development