Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-222

Support for concatenating of files into a single file

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      An API to concatenate files of same size and replication factor on HDFS into a single larger file.

      1. HDFS-222-10.patch
        38 kB
        Boris Shkolnik
      2. HDFS-222-10.patch
        38 kB
        Boris Shkolnik
      3. HDFS-222-9.patch
        37 kB
        Boris Shkolnik
      4. HDFS-222-9.patch
        46 kB
        Boris Shkolnik
      5. HDFS-222-8.patch
        46 kB
        Boris Shkolnik
      6. HDFS-222-7.patch
        27 kB
        Boris Shkolnik
      7. HDFS-222-6.patch
        27 kB
        Boris Shkolnik
      8. HDFS-222-5.patch
        27 kB
        Boris Shkolnik
      9. HDFS-222-4.patch
        41 kB
        Boris Shkolnik
      10. HDFS-222-3.patch
        40 kB
        Boris Shkolnik
      11. HDFS-222-2.patch
        39 kB
        Boris Shkolnik
      12. HDFS-222-1.patch
        38 kB
        Boris Shkolnik
      13. HDFS-222.patch
        43 kB
        Boris Shkolnik

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #72 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #72 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/72/ )
          Hide
          Hudson added a comment -

          Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #100 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/100/)

          Show
          Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #100 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/100/ )
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #124 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/124/)
          . Support for concatenating of files into a single file without copying. Contributed by Boris Shkolnik.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #124 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/124/ ) . Support for concatenating of files into a single file without copying. Contributed by Boris Shkolnik.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #86 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/86/)
          . Support for concatenating of files into a single file without copying. Contributed by Boris Shkolnik.

          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #86 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/86/ ) . Support for concatenating of files into a single file without copying. Contributed by Boris Shkolnik.
          Hide
          Hairong Kuang added a comment -

          I've just committed this. Thanks, Boris!

          Show
          Hairong Kuang added a comment - I've just committed this. Thanks, Boris!
          Hide
          Hairong Kuang added a comment -

          +1 The patch looks good.

          Show
          Hairong Kuang added a comment - +1 The patch looks good.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12423380/HDFS-222-10.patch
          against trunk revision 830003.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423380/HDFS-222-10.patch against trunk revision 830003. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/63/console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          optimization as mentioned in the review

          Show
          Boris Shkolnik added a comment - optimization as mentioned in the review
          Hide
          Hairong Kuang added a comment -

          Hope that this is the last comment
          1. FSNamesystem.java: no need isDirectory() check on INodeFile.
          2. FSDirectory.java: optimization: when removing src inode, no need to traverse src path to get all inodes on the path again since src inode and its ancestor indoes are known.

          Show
          Hairong Kuang added a comment - Hope that this is the last comment 1. FSNamesystem.java: no need isDirectory() check on INodeFile. 2. FSDirectory.java: optimization: when removing src inode, no need to traverse src path to get all inodes on the path again since src inode and its ancestor indoes are known.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12423359/HDFS-222-10.patch
          against trunk revision 830003.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423359/HDFS-222-10.patch against trunk revision 830003. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/81/console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          updated OfflineImageViewer with the yet newer version (-22)

          Show
          Boris Shkolnik added a comment - updated OfflineImageViewer with the yet newer version (-22)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12423268/HDFS-222-9.patch
          against trunk revision 830003.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423268/HDFS-222-9.patch against trunk revision 830003. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/62/console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          review + merge with trunk

          Show
          Boris Shkolnik added a comment - review + merge with trunk
          Hide
          Hairong Kuang added a comment -

          Borris, sorry that more comments are here. Hope that this will be the last iteration:

          1. Exam all loggings at the info level to see if you can either remove them or change them to be debug level
          2. Remove the change to BlockManager.java since the code is already commented.
          3. FSNamesystem.java:
            • no need to verify quota since target and sources are in the same directory
            • should use getINodeFile to get inode of target & sources
            • no need to check if srcInode is root
            • minor: cacatInternal better to be a synchronized method in FSDirectory
          4. FSDirectory.java
            • No need to track & update disk space consumed in upprotectedConcat.
            • in deleteFileWithoutComment, no need to update modification time for the parent for every source & update disk space consumed.
            • you might be able to remove deleteFileWithoutCommit to take advantage of target & sources are under one parent.
            • minor: optimize the number of copies when concatenating blocks.
          5. FSEditLog.java: check version # for handling OP_CONCAT_DELETE in loadEditLogs.
          Show
          Hairong Kuang added a comment - Borris, sorry that more comments are here. Hope that this will be the last iteration: Exam all loggings at the info level to see if you can either remove them or change them to be debug level Remove the change to BlockManager.java since the code is already commented. FSNamesystem.java: no need to verify quota since target and sources are in the same directory should use getINodeFile to get inode of target & sources no need to check if srcInode is root minor: cacatInternal better to be a synchronized method in FSDirectory FSDirectory.java No need to track & update disk space consumed in upprotectedConcat. in deleteFileWithoutComment, no need to update modification time for the parent for every source & update disk space consumed. you might be able to remove deleteFileWithoutCommit to take advantage of target & sources are under one parent. minor: optimize the number of copies when concatenating blocks. FSEditLog.java: check version # for handling OP_CONCAT_DELETE in loadEditLogs.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12422829/HDFS-222-9.patch
          against trunk revision 828116.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/48/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/48/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/48/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/48/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422829/HDFS-222-9.patch against trunk revision 828116. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/48/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/48/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/48/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/48/console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          updated OfflineImageViewer with the new LAYOUT version.
          (may need to do it again if HDFS-654 goes in first).

          Show
          Boris Shkolnik added a comment - updated OfflineImageViewer with the new LAYOUT version. (may need to do it again if HDFS-654 goes in first).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12422587/HDFS-222-8.patch
          against trunk revision 826149.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422587/HDFS-222-8.patch against trunk revision 826149. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/36/console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          Put in the restriction for trg and src to be in the same directory.

          Show
          Boris Shkolnik added a comment - Put in the restriction for trg and src to be in the same directory.
          Hide
          Venkatesh Seetharam added a comment -

          > Perhaps we should restrict the operation to concat files in the same directory.
          Reasonable but can it be recursive, includes sub directories as well?

          Show
          Venkatesh Seetharam added a comment - > Perhaps we should restrict the operation to concat files in the same directory. Reasonable but can it be recursive, includes sub directories as well?
          Hide
          Hairong Kuang added a comment -

          > Perhaps we should restrict the operation to concat files in the same directory.
          +1. This is a reasonable restriction that makes the code much clean and much less error prone.

          Show
          Hairong Kuang added a comment - > Perhaps we should restrict the operation to concat files in the same directory. +1. This is a reasonable restriction that makes the code much clean and much less error prone.
          Hide
          Sanjay Radia added a comment -

          Handling quotas correctly is going to be hard. Especially given the new problems we saw in HDFS-677 wrt to renaming
          between directories when the quota is exhausted in the target.
          While this can be coded, I don't think the complexity is worth it.
          Perhaps we should restrict the operation to concat files in the same directory.

          Show
          Sanjay Radia added a comment - Handling quotas correctly is going to be hard. Especially given the new problems we saw in HDFS-677 wrt to renaming between directories when the quota is exhausted in the target. While this can be coded, I don't think the complexity is worth it. Perhaps we should restrict the operation to concat files in the same directory.
          Hide
          Boris Shkolnik added a comment -

          fixed some comments from review, as follows:
          check permissions in one place
          sync on fsnamespace
          update LAYOUT version
          checking preferred block size
          ensuring no quota updates if trg and src in the same directory

          Show
          Boris Shkolnik added a comment - fixed some comments from review, as follows: check permissions in one place sync on fsnamespace update LAYOUT version checking preferred block size ensuring no quota updates if trg and src in the same directory
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12421704/HDFS-222-6.patch
          against trunk revision 822153.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421704/HDFS-222-6.patch against trunk revision 822153. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/20/console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          fixed warnings.
          changed signature from concat(String, string...) to concat (Path, Path[]) to match other APIs
          added few more tests

          Show
          Boris Shkolnik added a comment - fixed warnings. changed signature from concat(String, string...) to concat (Path, Path[]) to match other APIs added few more tests
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12421595/HDFS-222-5.patch
          against trunk revision 822153.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/18/testReport/
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/18/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/18/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421595/HDFS-222-5.patch against trunk revision 822153. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/18/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/18/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/18/console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          dhruba borthakur wrote:

          Does it make it easier to use if this can be "bin/hadoop hdfs -concat fileA fileB"

          Than can be done,unless anyone objects. I will look into this.

          Show
          Boris Shkolnik added a comment - dhruba borthakur wrote: Does it make it easier to use if this can be "bin/hadoop hdfs -concat fileA fileB" Than can be done,unless anyone objects. I will look into this.
          Hide
          Boris Shkolnik added a comment -

          implemented all the comments by Hairong

          Show
          Boris Shkolnik added a comment - implemented all the comments by Hairong
          Hide
          dhruba borthakur added a comment -

          This is a useful feature to have. I am guessing that there will be a few more hdfs-specific tools that we will develop going forward. This toll currently is invoked by "bin/hadoop jar org.apache.hdfs.tools.HDFSConcat fileA fileB". Does it make it easier to use if this can be "bin/hadoop hdfs -concat fileA fileB"?

          Show
          dhruba borthakur added a comment - This is a useful feature to have. I am guessing that there will be a few more hdfs-specific tools that we will develop going forward. This toll currently is invoked by "bin/hadoop jar org.apache.hdfs.tools.HDFSConcat fileA fileB". Does it make it easier to use if this can be "bin/hadoop hdfs -concat fileA fileB"?
          Hide
          Hairong Kuang added a comment -

          Concat also needs to make sure that all files are not under construction.

          Show
          Hairong Kuang added a comment - Concat also needs to make sure that all files are not under construction.
          Hide
          Hairong Kuang added a comment -

          Some initial comments:

          • ClientProtocol.java:
            1. the protocol's version should be bumped;
            2. unnecessary changes to the "rename" signature.
          • FSNamesytem.java:
            1. I would suggest the following changes to the code organization so the method naming is consistent with existing namespace changes
              concat: an un-synchronous method which contains non-inode related checks on the input parameters, calls concatInternal, and sync edit log;
              concatInternal: an synchrounous private method which does the real work;
              remove unprotectedConcat in FsNamesystem and add a method "concat" to FSDirectory which performs all inode-related checkings and namespace changes.
            2. permission checking: I would prefer to perform permission checking on target and srcs in one place. We need WRITE permission on the parent of the source node not on the ancestor.
            3. Block size checking could be simplified by making all files have the same preferred block size and each file's last block is full except for the last file.
            4. INodeFile means this inode represents a file. So checking if an inode is a directory should be performed before converting an inode to be INodeFile.
          • FSEditLog.java: since the edit log has a new op, on-disk layout version should be updated.
          • minor: should make all concat related methods have the same signatures. Some of them have "src" as the 2nd parameter. For the first parameter, I prefer to use "target" instead of "trg".
          Show
          Hairong Kuang added a comment - Some initial comments: ClientProtocol.java: the protocol's version should be bumped; unnecessary changes to the "rename" signature. FSNamesytem.java: I would suggest the following changes to the code organization so the method naming is consistent with existing namespace changes concat: an un-synchronous method which contains non-inode related checks on the input parameters, calls concatInternal, and sync edit log; concatInternal: an synchrounous private method which does the real work; remove unprotectedConcat in FsNamesystem and add a method "concat" to FSDirectory which performs all inode-related checkings and namespace changes. permission checking: I would prefer to perform permission checking on target and srcs in one place. We need WRITE permission on the parent of the source node not on the ancestor. Block size checking could be simplified by making all files have the same preferred block size and each file's last block is full except for the last file. INodeFile means this inode represents a file. So checking if an inode is a directory should be performed before converting an inode to be INodeFile. FSEditLog.java: since the edit log has a new op, on-disk layout version should be updated. minor: should make all concat related methods have the same signatures. Some of them have "src" as the 2nd parameter. For the first parameter, I prefer to use "target" instead of "trg".
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12420606/HDFS-222-4.patch
          against trunk revision 818801.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 14 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420606/HDFS-222-4.patch against trunk revision 818801. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 14 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/48/console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          fixed javac warnings

          Show
          Boris Shkolnik added a comment - fixed javac warnings
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12420476/HDFS-222-3.patch
          against trunk revision 818575.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 14 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The applied patch generated 20 javac compiler warnings (more than the trunk's current 18 warnings).

          -1 findbugs. The patch appears to introduce 1 new Findbugs warnings.

          -1 release audit. The applied patch generated 106 release audit warnings (more than the trunk's current 105 warnings).

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/testReport/
          Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420476/HDFS-222-3.patch against trunk revision 818575. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 14 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 20 javac compiler warnings (more than the trunk's current 18 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. -1 release audit. The applied patch generated 106 release audit warnings (more than the trunk's current 105 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/47/console This message is automatically generated.
          Hide
          Boris Shkolnik added a comment -

          added check that no two files are the same

          Show
          Boris Shkolnik added a comment - added check that no two files are the same
          Hide
          Boris Shkolnik added a comment -

          added test for permissions

          Show
          Boris Shkolnik added a comment - added test for permissions
          Hide
          Boris Shkolnik added a comment -

          reverted imports

          Show
          Boris Shkolnik added a comment - reverted imports
          Hide
          Boris Shkolnik added a comment -

          at this point we require all the files to have full blocks (except the final one) and same replication.

          I will revert the imports to make the merge easier.

          Show
          Boris Shkolnik added a comment - at this point we require all the files to have full blocks (except the final one) and same replication. I will revert the imports to make the merge easier.
          Hide
          Konstantin Shvachko added a comment -

          Interesting that read just works with different lengths!
          I think you should combine the incomplete block and full block tests into one with somehow randomized block lengths, so that your 10 files had all different lengths.
          It would be interesting also to test concatenation of files with different replication factors. So that the concat code takes care of removing or adding replicas of the blocks of the new file.
          Boris, could you please revert massive refactoring of imports in FSNamesystem and submit it as a separate patch. We need to keep three branches in sync now, and this change will not belong to the other two.

          Show
          Konstantin Shvachko added a comment - Interesting that read just works with different lengths! I think you should combine the incomplete block and full block tests into one with somehow randomized block lengths, so that your 10 files had all different lengths. It would be interesting also to test concatenation of files with different replication factors. So that the concat code takes care of removing or adding replicas of the blocks of the new file. Boris, could you please revert massive refactoring of imports in FSNamesystem and submit it as a separate patch. We need to keep three branches in sync now, and this change will not belong to the other two.
          Hide
          Boris Shkolnik added a comment -

          first draft

          Show
          Boris Shkolnik added a comment - first draft
          Hide
          Doug Cutting added a comment -

          That sounds reasonable.

          Show
          Doug Cutting added a comment - That sounds reasonable.
          Hide
          Boris Shkolnik added a comment -

          To simplify (and to avoid overwrite question) I suggest we concatenate srcs' blocks TO the target file.
          i.e. if we have

          File1

          {Block11, Block12}

          File2

          {Block21, Block22}

          File3

          {Block31, Block32}

          and we do
          concat(File1, File2, File3)

          we get
          File1

          {Block11, Block12, Block21, Block22, Block31, Block32}

          and File2, File3 deleted

          To make things atomic we would need to introduce one new OP_CONCAT_DELETE for the EditsLog, which will be recorded only when every block is moved and source file deleted (we cannot just call FsDirectory.delete() for this reason).

          Show
          Boris Shkolnik added a comment - To simplify (and to avoid overwrite question) I suggest we concatenate srcs' blocks TO the target file. i.e. if we have File1 {Block11, Block12} File2 {Block21, Block22} File3 {Block31, Block32} and we do concat(File1, File2, File3) we get File1 {Block11, Block12, Block21, Block22, Block31, Block32} and File2, File3 deleted To make things atomic we would need to introduce one new OP_CONCAT_DELETE for the EditsLog, which will be recorded only when every block is moved and source file deleted (we cannot just call FsDirectory.delete() for this reason).
          Hide
          Jakob Homan added a comment -

          Varargs is not appropriate here. I can't see a time when you'd know ahead of time which blocks to combine, outside of tests. An array is the better option.

          Show
          Jakob Homan added a comment - Varargs is not appropriate here. I can't see a time when you'd know ahead of time which blocks to combine, outside of tests. An array is the better option.
          Hide
          Doug Cutting added a comment -

          This sounds good to me.

          It might be simplest if the source were removed and the entire operation is atomic, e.g., if it succeeds then the source is gone and if it fails the destination is unchanged and the source is still present. There is no middle ground.

          Also, what happens if the target already exists? Should we add an overwrite option? If this is unspecified and the target exists then an error is thrown. If it is specified, and the target is a plain file, then the old target's blocks are removed as a part of the atomic operation. If the old target is a directory an error is thrown regardless of overwrite.

          (Can you guess which issue I've been following?)

          Show
          Doug Cutting added a comment - This sounds good to me. It might be simplest if the source were removed and the entire operation is atomic, e.g., if it succeeds then the source is gone and if it fails the destination is unchanged and the source is still present. There is no middle ground. Also, what happens if the target already exists? Should we add an overwrite option? If this is unspecified and the target exists then an error is thrown. If it is specified, and the target is a plain file, then the old target's blocks are removed as a part of the atomic operation. If the old target is a directory an error is thrown regardless of overwrite. (Can you guess which issue I've been following?)
          Hide
          Boris Shkolnik added a comment -

          here is the tentative plan:

          1. API void concat(String trg, String ... srcs) wil be added to DistributedFileSystem and DFSClient.
          2. actual implemntation will be in FsNamesystem.java and FSDirectory.java
          3. The following prerequisites will be checked before the actual blocks will be moved:
            • Files are not empty and not null
            • NameNode is not in the SafeMode
            • Permissions are valid:
              • Write permissions for target file
              • Read permissions for src files
              • Write permissions in the source parent directory (for delete).
          4. All the blocks of all the files are of the same size and same replication level.

          Actions:

          1. Actual blocks moved to the target file
          2. Access/Modification times to be updated of the following files:
            • target file
            • src directory
          3. Quotas updated:
            • Target directory - NSQuota +0, DSQuota +Sum(block sizes)
            • Src directory - NSQuota -1, DSQuoa -Sum(block sizes)

          Errors handling using exceptions.

          Note. Should srcs be an array instead of varargs? Seems safer and easier to use.

          Show
          Boris Shkolnik added a comment - here is the tentative plan: API void concat(String trg, String ... srcs) wil be added to DistributedFileSystem and DFSClient. actual implemntation will be in FsNamesystem.java and FSDirectory.java The following prerequisites will be checked before the actual blocks will be moved: Files are not empty and not null NameNode is not in the SafeMode Permissions are valid: Write permissions for target file Read permissions for src files Write permissions in the source parent directory (for delete). All the blocks of all the files are of the same size and same replication level. Actions: Actual blocks moved to the target file Access/Modification times to be updated of the following files: target file src directory Quotas updated: Target directory - NSQuota +0, DSQuota +Sum(block sizes) Src directory - NSQuota -1, DSQuoa -Sum(block sizes) Errors handling using exceptions. Note. Should srcs be an array instead of varargs? Seems safer and easier to use.
          Hide
          Doug Cutting added a comment -

          > add this to distributedFileSystem and not FileSystem and that distcp does as "class narrow" to use it if it is available

          +1 This sounds like a reasonable plan.

          Show
          Doug Cutting added a comment - > add this to distributedFileSystem and not FileSystem and that distcp does as "class narrow" to use it if it is available +1 This sounds like a reasonable plan.
          Hide
          Sanjay Radia added a comment -

          Clearly this is a hack to support parallel copies of large files in distcp. (It is an embarrassment that hadoop does not support this).
          The proper way to do this is to create a first class abstraction for a "file as a container for blocks". But that is long project.
          So the new concat method would be marked as limited-private.

          Breaking the FileSystem abstraction issue - I don't get it: All file systems impls can support a concat of files, though most cannot do this atomically.
          Owen are you proposing that we add this to distributedFileSystem and not FileSystem and that distcp does as "class narrow" to use it if it is available?
          I am fine with that.

          Show
          Sanjay Radia added a comment - Clearly this is a hack to support parallel copies of large files in distcp. (It is an embarrassment that hadoop does not support this). The proper way to do this is to create a first class abstraction for a "file as a container for blocks". But that is long project. So the new concat method would be marked as limited-private. Breaking the FileSystem abstraction issue - I don't get it: All file systems impls can support a concat of files, though most cannot do this atomically. Owen are you proposing that we add this to distributedFileSystem and not FileSystem and that distcp does as "class narrow" to use it if it is available? I am fine with that.
          Hide
          Owen O'Malley added a comment -

          I've said it before, but just to reiterate: I think this is a bad idea.

          1. It would have to be HDFS-specific (ie. not a change to FileSystem or the newer Files).
          2. A distcp dependence on HDFS is problematic.

          So to make use of this functionality, distcp would need to break the abstraction of FileSystem.

          On the other hand, an extension to the FileSystem interface that supports modification of HDFS files would solve the need and not cause this damage... smile

          Show
          Owen O'Malley added a comment - I've said it before, but just to reiterate: I think this is a bad idea. 1. It would have to be HDFS-specific (ie. not a change to FileSystem or the newer Files). 2. A distcp dependence on HDFS is problematic. So to make use of this functionality, distcp would need to break the abstraction of FileSystem. On the other hand, an extension to the FileSystem interface that supports modification of HDFS files would solve the need and not cause this damage... smile
          Hide
          Venkatesh Seetharam added a comment -

          While copying files with a few large ones can result in long tails. This can help when processing chunks of a file so we can get parts of a file in parallel and stitch 'em up at the destination.

          Show
          Venkatesh Seetharam added a comment - While copying files with a few large ones can result in long tails. This can help when processing chunks of a file so we can get parts of a file in parallel and stitch 'em up at the destination.
          Hide
          Doug Cutting added a comment -

          What's the use case? I'm guessing the end-goal is cross-version distcp again. Is that right? If so, I wonder if we should discuss that as a distinct issue and craft an end-to-end solution for it?

          Show
          Doug Cutting added a comment - What's the use case? I'm guessing the end-goal is cross-version distcp again. Is that right? If so, I wonder if we should discuss that as a distinct issue and craft an end-to-end solution for it?
          Hide
          Venkatesh Seetharam added a comment -

          This should work well but 2 features would be desirable.
          1. varargs for files to be appended, concat(File target, File ... sourceFiles)
          2. deleting F2 would be awesome since the client needs to now remove all the empty files

          Show
          Venkatesh Seetharam added a comment - This should work well but 2 features would be desirable. 1. varargs for files to be appended, concat(File target, File ... sourceFiles) 2. deleting F2 would be awesome since the client needs to now remove all the empty files
          Hide
          Robert Chansler added a comment -

          The present preferred option is to do something like venkat(F1, F2) moves the blocks of F2 to F1, leaving F2 empty. This will be allowed only if the block sizes of F1 and F2 are the same, and F1 is an integral number of full blocks. This would be an HDFS special, and not generally implemented for other file systems where block manipulation is (un-)ill-defined.

          Show
          Robert Chansler added a comment - The present preferred option is to do something like venkat(F1, F2) moves the blocks of F2 to F1, leaving F2 empty. This will be allowed only if the block sizes of F1 and F2 are the same, and F1 is an integral number of full blocks. This would be an HDFS special, and not generally implemented for other file systems where block manipulation is (un-)ill-defined.
          Hide
          Venkatesh Seetharam added a comment -

          It can always throw an exception if the replication factor or sizes do not match.

          Show
          Venkatesh Seetharam added a comment - It can always throw an exception if the replication factor or sizes do not match.
          Hide
          Owen O'Malley added a comment -

          I don't think this is a good feature to expose. It has the unfortunate characteristic that it will perform very badly on most file systems and even in some non-trivial cases of HDFS. Take for instance the case where the files have different replication factors or block sizes.

          Show
          Owen O'Malley added a comment - I don't think this is a good feature to expose. It has the unfortunate characteristic that it will perform very badly on most file systems and even in some non-trivial cases of HDFS. Take for instance the case where the files have different replication factors or block sizes.

            People

            • Assignee:
              Boris Shkolnik
              Reporter:
              Venkatesh Seetharam
            • Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development