Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-2834

ByteBuffer-based read API for DFSInputStream

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.2-alpha
    • Component/s: hdfs-client, performance
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The DFSInputStream read-path always copies bytes into a JVM-allocated byte[]. Although for many clients this is desired behaviour, in certain situations, such as native-reads through libhdfs, this imposes an extra copy penalty since the byte[] needs to be copied out again into a natively readable memory area.

      For these cases, it would be preferable to allow the client to supply its own buffer, wrapped in a ByteBuffer, to avoid that final copy overhead.

      1. HDFS-2834.10.patch
        48 kB
        Henry Robinson
      2. HDFS-2834.11.patch
        60 kB
        Henry Robinson
      3. HDFS-2834.3.patch
        63 kB
        Henry Robinson
      4. HDFS-2834.4.patch
        58 kB
        Henry Robinson
      5. HDFS-2834.5.patch
        59 kB
        Henry Robinson
      6. HDFS-2834.6.patch
        59 kB
        Henry Robinson
      7. HDFS-2834.7.patch
        61 kB
        Henry Robinson
      8. HDFS-2834.8.patch
        61 kB
        Henry Robinson
      9. HDFS-2834.9.patch
        49 kB
        Henry Robinson
      10. HDFS-2834.patch
        64 kB
        Henry Robinson
      11. HDFS-2834.patch
        66 kB
        Henry Robinson
      12. hdfs-2834-libhdfs-benchmark.png
        43 kB
        Henry Robinson
      13. HDFS-2834-no-common.patch
        60 kB
        Henry Robinson

        Issue Links

          Activity

          Hide
          Lars Hofhansl added a comment -

          Just for reference with many open files one can easily OOM on direct buffer memory. See: HBASE-8143.
          1MB seems to be a rather large default.

          Show
          Lars Hofhansl added a comment - Just for reference with many open files one can easily OOM on direct buffer memory. See: HBASE-8143 . 1MB seems to be a rather large default.
          Hide
          Todd Lipcon added a comment -

          Committed to branch-2 as well.

          Show
          Todd Lipcon added a comment - Committed to branch-2 as well.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1027 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1027/)
          HDFS-2834. Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1027 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1027/ ) HDFS-2834 . Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #992 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/992/)
          HDFS-2834. Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #992 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/992/ ) HDFS-2834 . Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1920 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1920/)
          HDFS-2834. Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474)

          Result = ABORTED
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1920 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1920/ ) HDFS-2834 . Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474) Result = ABORTED todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1911 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1911/)
          HDFS-2834. Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1911 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1911/ ) HDFS-2834 . Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1985 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1985/)
          HDFS-2834. Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474
          Files :

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1985 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1985/ ) HDFS-2834 . Add a ByteBuffer-based read API to DFSInputStream. Contributed by Henry Robinson. (Revision 1303474) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1303474 Files : /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
          Hide
          Todd Lipcon added a comment -

          +1. Committed to trunk. Thanks for the nice contribution, Henry! Let me know when you have the libhdfs portion ready and I will take a look.

          Show
          Todd Lipcon added a comment - +1. Committed to trunk. Thanks for the nice contribution, Henry! Let me know when you have the libhdfs portion ready and I will take a look.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519181/HDFS-2834.11.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2059//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2059//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519181/HDFS-2834.11.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2059//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2059//console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          The curious case of the magic disappearing TestParallelReadUtil.java

          Show
          Henry Robinson added a comment - The curious case of the magic disappearing TestParallelReadUtil.java
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519092/HDFS-2834.10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed the unit tests build

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2058//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2058//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519092/HDFS-2834.10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed the unit tests build +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2058//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2058//console This message is automatically generated.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4212/
          -----------------------------------------------------------

          (Updated 2012-03-20 16:29:56.616292)

          Review request for hadoop-hdfs and Todd Lipcon.

          Changes
          -------

          Review responses

          Summary
          -------

          New patch for HDFS-2834 (I can't update the old review request).

          This addresses bug HDFS-2834.
          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs (updated)


          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 71c8a50
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java f4052bb

          Diff: https://reviews.apache.org/r/4212/diff

          Testing
          -------

          Thanks,

          Henry

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/ ----------------------------------------------------------- (Updated 2012-03-20 16:29:56.616292) Review request for hadoop-hdfs and Todd Lipcon. Changes ------- Review responses Summary ------- New patch for HDFS-2834 (I can't update the old review request). This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs (updated) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 71c8a50 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java f4052bb Diff: https://reviews.apache.org/r/4212/diff Testing ------- Thanks, Henry
          Hide
          Henry Robinson added a comment -

          Review comments.

          Show
          Henry Robinson added a comment - Review comments.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-20 01:27:50, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java, line 44

          > <https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line44>

          >

          > shouldn't this be true?

          Oops, yes. Thankfully the test still passes when it's testing the right path...

          On 2012-03-20 01:27:50, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java, lines 81-82

          > <https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line81>

          >

          > no reason to use DFSClient here. Instead you can just use the filesystem, right? Then downcast the stream you get back?

          Good point - no need even to downcast since FSDataInputStream has the API.

          On 2012-03-20 01:27:50, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java, line 104

          > <https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line104>

          >

          > don't you want an assert on sawException here? You can also use GenericTestUtils.assertExceptionContains() if you want to check the text of it

          Good catch. No particular need to assert the content of the exception - any checksum error is good enough here.

          On 2012-03-20 01:27:50, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 562-564

          > <https://reviews.apache.org/r/4212/diff/2/?file=90207#file90207line562>

          >

          > this comment seems like it's in the wrong spot, since the code that comes after it doesn't reference offsetFromChunkBoundary.

          I removed the comment, it's covered by the comment at line 549.

          • Henry

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4212/#review6103
          -----------------------------------------------------------

          On 2012-03-09 00:47:24, Henry Robinson wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4212/

          -----------------------------------------------------------

          (Updated 2012-03-09 00:47:24)

          Review request for hadoop-hdfs and Todd Lipcon.

          Summary

          -------

          New patch for HDFS-2834 (I can't update the old review request).

          This addresses bug HDFS-2834.

          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs

          -----

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8

          Diff: https://reviews.apache.org/r/4212/diff

          Testing

          -------

          Thanks,

          Henry

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-20 01:27:50, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java, line 44 > < https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line44 > > > shouldn't this be true? Oops, yes. Thankfully the test still passes when it's testing the right path... On 2012-03-20 01:27:50, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java, lines 81-82 > < https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line81 > > > no reason to use DFSClient here. Instead you can just use the filesystem, right? Then downcast the stream you get back? Good point - no need even to downcast since FSDataInputStream has the API. On 2012-03-20 01:27:50, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java, line 104 > < https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line104 > > > don't you want an assert on sawException here? You can also use GenericTestUtils.assertExceptionContains() if you want to check the text of it Good catch. No particular need to assert the content of the exception - any checksum error is good enough here. On 2012-03-20 01:27:50, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 562-564 > < https://reviews.apache.org/r/4212/diff/2/?file=90207#file90207line562 > > > this comment seems like it's in the wrong spot, since the code that comes after it doesn't reference offsetFromChunkBoundary. I removed the comment, it's covered by the comment at line 549. Henry ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/#review6103 ----------------------------------------------------------- On 2012-03-09 00:47:24, Henry Robinson wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/ ----------------------------------------------------------- (Updated 2012-03-09 00:47:24) Review request for hadoop-hdfs and Todd Lipcon. Summary ------- New patch for HDFS-2834 (I can't update the old review request). This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs ----- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4212/diff Testing ------- Thanks, Henry
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4212/#review6103
          -----------------------------------------------------------

          Real close now!

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment13128>

          this comment seems like it's in the wrong spot, since the code that comes after it doesn't reference offsetFromChunkBoundary.

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment13130>

          shouldn't this be true?

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment13132>

          no reason to use DFSClient here. Instead you can just use the filesystem, right? Then downcast the stream you get back?

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment13131>

          don't you want an assert on sawException here? You can also use GenericTestUtils.assertExceptionContains() if you want to check the text of it

          • Todd

          On 2012-03-09 00:47:24, Henry Robinson wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4212/

          -----------------------------------------------------------

          (Updated 2012-03-09 00:47:24)

          Review request for hadoop-hdfs and Todd Lipcon.

          Summary

          -------

          New patch for HDFS-2834 (I can't update the old review request).

          This addresses bug HDFS-2834.

          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs

          -----

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8

          Diff: https://reviews.apache.org/r/4212/diff

          Testing

          -------

          Thanks,

          Henry

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/#review6103 ----------------------------------------------------------- Real close now! hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment13128 > this comment seems like it's in the wrong spot, since the code that comes after it doesn't reference offsetFromChunkBoundary. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment13130 > shouldn't this be true? hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment13132 > no reason to use DFSClient here. Instead you can just use the filesystem, right? Then downcast the stream you get back? hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment13131 > don't you want an assert on sawException here? You can also use GenericTestUtils.assertExceptionContains() if you want to check the text of it Todd On 2012-03-09 00:47:24, Henry Robinson wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/ ----------------------------------------------------------- (Updated 2012-03-09 00:47:24) Review request for hadoop-hdfs and Todd Lipcon. Summary ------- New patch for HDFS-2834 (I can't update the old review request). This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs ----- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4212/diff Testing ------- Thanks, Henry
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4212/
          -----------------------------------------------------------

          (Updated 2012-03-09 00:47:24.765130)

          Review request for hadoop-hdfs and Todd Lipcon.

          Summary
          -------

          New patch for HDFS-2834 (I can't update the old review request).

          This addresses bug HDFS-2834.
          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs


          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8

          Diff: https://reviews.apache.org/r/4212/diff

          Testing
          -------

          Thanks,

          Henry

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/ ----------------------------------------------------------- (Updated 2012-03-09 00:47:24.765130) Review request for hadoop-hdfs and Todd Lipcon. Summary ------- New patch for HDFS-2834 (I can't update the old review request). This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4212/diff Testing ------- Thanks, Henry
          Hide
          Henry Robinson added a comment -

          New patch addressing review comments. Note - I've left out the libhdfs work and plan to address that in a separate ticket since I think the changes we talked about might need to be reflected in the existing read path, making this patch more cumbersome.

          Show
          Henry Robinson added a comment - New patch addressing review comments. Note - I've left out the libhdfs work and plan to address that in a separate ticket since I think the changes we talked about might need to be reflected in the existing read path, making this patch more cumbersome.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > some stuff around error cases here – I think you'd run into these bugs if you hit a checksum error, since the "retry on different node" path would trigger in DFSInputStream, but the target buffer's position would be prematurely updated from prior attempts.

          >

          > But getting close!

          >

          > Is it possible to split out the test refactor change from this one? Hard to tell what's changed in the tests vs just refactored. If it's a pain in the butt I'll look more closely.

          It's not super easy, unfortunately - let me know if it's a big problem and I'll perform the git surgery required (or provide annotations on the review, if that would help).

          Sounds like I need to write tests against a file with a malformed checksum; I'll post another patch when I'm done.

          I'll respond to the libhdfs changes in a separate update.

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java, line 491

          > <https://reviews.apache.org/r/4212/diff/1/?file=88647#file88647line491>

          >

          > do you really need synchronization here and below? it seems like the strategy is being called from a synchronized context, so this is redundant. Removing this also allows the inner classes to be static which I think is preferable

          The inner classes depend on blockReader, which is why they're non-static. The synchronisation is (hopefully) to calm Findbugs since it gives a false positive when blockReader is perhaps accessed from a non-synchronized context (although the strategy classes are always called from a method which already has the lock).

          We can kill two birds with one stone by making the classes static and passing blockReader as a parameter either at construction or read-time; findBugs probably won't follow the aliasing and so won't try and understand the locks.

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 268-270

          > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line268>

          >

          > I envision some silly user setting the buffer size < bytesPerChecksum and getting screwed here. Worth a check for valid range here (throwing IOE or IllegalArgumentException or something)

          Hah, I was just going to let them have 0-byte reads Yes, throwing is a better idea.

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 123-133

          > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line123>

          >

          > please use javadoc style comments for these, rather than //s

          Done.

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java, line 22

          > <https://reviews.apache.org/r/4212/diff/1/?file=88644#file88644line22>

          >

          > unused import?

          Good catch.

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 357

          > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line357>

          >

          > should this line go in a finally{} clause? Otherwise if 'to' doesn't have enough space for the copy, we'd end up leaving 'from' with the modified limit

          The callers are supposed to guarantee that 'to' has enough space, but it does no harm to be defensive.

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 397

          > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line397>

          >

          > in finally{} clause perhaps, so the limit isn't messed up by a failed read

          Done.

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 459

          > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line459>

          >

          > the error conditions here don't quite work right.

          > For example, if a checksum error occurs, the buffer's position will still be updated.

          >

          > Above, there's a similar problem if "phase 1" succeeds but "phase 2" encounters an error – the position will change but the method will throw.

          I think it might make sense for the client of BlockReaderLocal to do the cleanup, since they might have to do some reset logic of their own anyhow. In particular, DFSInputStream seeks back to the beginning of the read. We might as well keep all the compensation logic in the same place, and DFSInputStream is best placed to know what it wants to do. Plus it's a bit easier to wrap a single read call in a try / catch block than to try and reason about individual failure modes inside the read method. Does that sound sensible?

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 300-301

          > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line300>

          >

          > This isn't your bug, but I just realized there's a potential leak here if the skip() calls below failed. I think we need to add a try..finally{} which returns these buffers to the pool if construction fails

          Done.

          On 2012-03-07 00:11:46, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 517-521

          > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line517>

          >

          > no '-'s needed in javadoc

          Done.

          • Henry

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4212/#review5665
          -----------------------------------------------------------

          On 2012-03-06 23:14:07, Henry Robinson wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4212/

          -----------------------------------------------------------

          (Updated 2012-03-06 23:14:07)

          Review request for hadoop-hdfs and Todd Lipcon.

          Summary

          -------

          New patch for HDFS-2834 (I can't update the old review request).

          This addresses bug HDFS-2834.

          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs

          -----

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8

          Diff: https://reviews.apache.org/r/4212/diff

          Testing

          -------

          Thanks,

          Henry

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-07 00:11:46, Todd Lipcon wrote: > some stuff around error cases here – I think you'd run into these bugs if you hit a checksum error, since the "retry on different node" path would trigger in DFSInputStream, but the target buffer's position would be prematurely updated from prior attempts. > > But getting close! > > Is it possible to split out the test refactor change from this one? Hard to tell what's changed in the tests vs just refactored. If it's a pain in the butt I'll look more closely. It's not super easy, unfortunately - let me know if it's a big problem and I'll perform the git surgery required (or provide annotations on the review, if that would help). Sounds like I need to write tests against a file with a malformed checksum; I'll post another patch when I'm done. I'll respond to the libhdfs changes in a separate update. On 2012-03-07 00:11:46, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java, line 491 > < https://reviews.apache.org/r/4212/diff/1/?file=88647#file88647line491 > > > do you really need synchronization here and below? it seems like the strategy is being called from a synchronized context, so this is redundant. Removing this also allows the inner classes to be static which I think is preferable The inner classes depend on blockReader, which is why they're non-static. The synchronisation is (hopefully) to calm Findbugs since it gives a false positive when blockReader is perhaps accessed from a non-synchronized context (although the strategy classes are always called from a method which already has the lock). We can kill two birds with one stone by making the classes static and passing blockReader as a parameter either at construction or read-time; findBugs probably won't follow the aliasing and so won't try and understand the locks. On 2012-03-07 00:11:46, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 268-270 > < https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line268 > > > I envision some silly user setting the buffer size < bytesPerChecksum and getting screwed here. Worth a check for valid range here (throwing IOE or IllegalArgumentException or something) Hah, I was just going to let them have 0-byte reads Yes, throwing is a better idea. On 2012-03-07 00:11:46, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 123-133 > < https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line123 > > > please use javadoc style comments for these, rather than //s Done. On 2012-03-07 00:11:46, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java, line 22 > < https://reviews.apache.org/r/4212/diff/1/?file=88644#file88644line22 > > > unused import? Good catch. On 2012-03-07 00:11:46, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 357 > < https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line357 > > > should this line go in a finally{} clause? Otherwise if 'to' doesn't have enough space for the copy, we'd end up leaving 'from' with the modified limit The callers are supposed to guarantee that 'to' has enough space, but it does no harm to be defensive. On 2012-03-07 00:11:46, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 397 > < https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line397 > > > in finally{} clause perhaps, so the limit isn't messed up by a failed read Done. On 2012-03-07 00:11:46, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 459 > < https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line459 > > > the error conditions here don't quite work right. > For example, if a checksum error occurs, the buffer's position will still be updated. > > Above, there's a similar problem if "phase 1" succeeds but "phase 2" encounters an error – the position will change but the method will throw. I think it might make sense for the client of BlockReaderLocal to do the cleanup, since they might have to do some reset logic of their own anyhow. In particular, DFSInputStream seeks back to the beginning of the read. We might as well keep all the compensation logic in the same place, and DFSInputStream is best placed to know what it wants to do. Plus it's a bit easier to wrap a single read call in a try / catch block than to try and reason about individual failure modes inside the read method. Does that sound sensible? On 2012-03-07 00:11:46, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 300-301 > < https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line300 > > > This isn't your bug, but I just realized there's a potential leak here if the skip() calls below failed. I think we need to add a try..finally{} which returns these buffers to the pool if construction fails Done. On 2012-03-07 00:11:46, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 517-521 > < https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line517 > > > no '-'s needed in javadoc Done. Henry ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/#review5665 ----------------------------------------------------------- On 2012-03-06 23:14:07, Henry Robinson wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/ ----------------------------------------------------------- (Updated 2012-03-06 23:14:07) Review request for hadoop-hdfs and Todd Lipcon. Summary ------- New patch for HDFS-2834 (I can't update the old review request). This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs ----- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4212/diff Testing ------- Thanks, Henry
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517333/HDFS-2834.8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1959//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1959//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517333/HDFS-2834.8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1959//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1959//console This message is automatically generated.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4212/#review5665
          -----------------------------------------------------------

          some stuff around error cases here – I think you'd run into these bugs if you hit a checksum error, since the "retry on different node" path would trigger in DFSInputStream, but the target buffer's position would be prematurely updated from prior attempts.

          But getting close!

          Is it possible to split out the test refactor change from this one? Hard to tell what's changed in the tests vs just refactored. If it's a pain in the butt I'll look more closely.

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
          <https://reviews.apache.org/r/4212/#comment12328>

          unused import?

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment12329>

          please use javadoc style comments for these, rather than //s

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment12330>

          I envision some silly user setting the buffer size < bytesPerChecksum and getting screwed here. Worth a check for valid range here (throwing IOE or IllegalArgumentException or something)

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment12331>

          This isn't your bug, but I just realized there's a potential leak here if the skip() calls below failed. I think we need to add a try..finally{} which returns these buffers to the pool if construction fails

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment12332>

          should this line go in a finally{} clause? Otherwise if 'to' doesn't have enough space for the copy, we'd end up leaving 'from' with the modified limit

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment12333>

          in finally{} clause perhaps, so the limit isn't messed up by a failed read

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment12337>

          the error conditions here don't quite work right.
          For example, if a checksum error occurs, the buffer's position will still be updated.

          Above, there's a similar problem if "phase 1" succeeds but "phase 2" encounters an error – the position will change but the method will throw.

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4212/#comment12338>

          no '-'s needed in javadoc

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
          <https://reviews.apache.org/r/4212/#comment12336>

          do you really need synchronization here and below? it seems like the strategy is being called from a synchronized context, so this is redundant. Removing this also allows the inner classes to be static which I think is preferable

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c
          <https://reviews.apache.org/r/4212/#comment12342>

          worth printing the exception message

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c
          <https://reviews.apache.org/r/4212/#comment12340>

          do you need to do something here to clear the JVM exception state? you'll probably have an OOME pending here. Probably good to use errNoFromException so that you get a reasonable error code (assume it does something good with OOME).

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c
          <https://reviews.apache.org/r/4212/#comment12343>

          is there any logging setting for libhdfs? seems like you'd want to print the stack trace if some debug flag is set

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c
          <https://reviews.apache.org/r/4212/#comment12344>

          style, } else {

          • Todd

          On 2012-03-06 23:14:07, Henry Robinson wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4212/

          -----------------------------------------------------------

          (Updated 2012-03-06 23:14:07)

          Review request for hadoop-hdfs and Todd Lipcon.

          Summary

          -------

          New patch for HDFS-2834 (I can't update the old review request).

          This addresses bug HDFS-2834.

          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs

          -----

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8

          Diff: https://reviews.apache.org/r/4212/diff

          Testing

          -------

          Thanks,

          Henry

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/#review5665 ----------------------------------------------------------- some stuff around error cases here – I think you'd run into these bugs if you hit a checksum error, since the "retry on different node" path would trigger in DFSInputStream, but the target buffer's position would be prematurely updated from prior attempts. But getting close! Is it possible to split out the test refactor change from this one? Hard to tell what's changed in the tests vs just refactored. If it's a pain in the butt I'll look more closely. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java < https://reviews.apache.org/r/4212/#comment12328 > unused import? hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment12329 > please use javadoc style comments for these, rather than //s hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment12330 > I envision some silly user setting the buffer size < bytesPerChecksum and getting screwed here. Worth a check for valid range here (throwing IOE or IllegalArgumentException or something) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment12331 > This isn't your bug, but I just realized there's a potential leak here if the skip() calls below failed. I think we need to add a try..finally{} which returns these buffers to the pool if construction fails hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment12332 > should this line go in a finally{} clause? Otherwise if 'to' doesn't have enough space for the copy, we'd end up leaving 'from' with the modified limit hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment12333 > in finally{} clause perhaps, so the limit isn't messed up by a failed read hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment12337 > the error conditions here don't quite work right. For example, if a checksum error occurs, the buffer's position will still be updated. Above, there's a similar problem if "phase 1" succeeds but "phase 2" encounters an error – the position will change but the method will throw. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4212/#comment12338 > no '-'s needed in javadoc hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java < https://reviews.apache.org/r/4212/#comment12336 > do you really need synchronization here and below? it seems like the strategy is being called from a synchronized context, so this is redundant. Removing this also allows the inner classes to be static which I think is preferable hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c < https://reviews.apache.org/r/4212/#comment12342 > worth printing the exception message hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c < https://reviews.apache.org/r/4212/#comment12340 > do you need to do something here to clear the JVM exception state? you'll probably have an OOME pending here. Probably good to use errNoFromException so that you get a reasonable error code (assume it does something good with OOME). hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c < https://reviews.apache.org/r/4212/#comment12343 > is there any logging setting for libhdfs? seems like you'd want to print the stack trace if some debug flag is set hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c < https://reviews.apache.org/r/4212/#comment12344 > style, } else { Todd On 2012-03-06 23:14:07, Henry Robinson wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/ ----------------------------------------------------------- (Updated 2012-03-06 23:14:07) Review request for hadoop-hdfs and Todd Lipcon. Summary ------- New patch for HDFS-2834 (I can't update the old review request). This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs ----- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4212/diff Testing ------- Thanks, Henry
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4212/
          -----------------------------------------------------------

          Review request for hadoop-hdfs and Todd Lipcon.

          Summary
          -------

          New patch for HDFS-2834 (I can't update the old review request).

          This addresses bug HDFS-2834.
          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs


          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777
          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5
          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8

          Diff: https://reviews.apache.org/r/4212/diff

          Testing
          -------

          Thanks,

          Henry

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/ ----------------------------------------------------------- Review request for hadoop-hdfs and Todd Lipcon. Summary ------- New patch for HDFS-2834 (I can't update the old review request). This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4212/diff Testing ------- Thanks, Henry
          Hide
          Henry Robinson added a comment -

          This patch should fix the findbugs warning, plus removes some debugging output. It also includes changes per Todd's review.

          I ran TestParallelLocalRead for a couple of hours with no failures - I'll try for a longer run tonight.

          I also benchmarked with and without the closure objects in DFSInputStream. There's no significant difference in throughput, even for small reads (I measured with 32 bytes reads).

          Show
          Henry Robinson added a comment - This patch should fix the findbugs warning, plus removes some debugging output. It also includes changes per Todd's review. I ran TestParallelLocalRead for a couple of hours with no failures - I'll try for a longer run tonight. I also benchmarked with and without the closure objects in DFSInputStream. There's no significant difference in throughput, even for small reads (I measured with 32 bytes reads).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517210/HDFS-2834.7.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1958//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1958//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1958//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517210/HDFS-2834.7.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1958//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1958//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1958//console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          Patch with review comments addressed.

          I'm running a soak test overnight of TestParallelLocalRead with the 'mixed' workload; I'll report back on its success tomorrow.

          I'll also do some Java-side benchmarks to isolate the libhdfs overhead, and also benchmark the lambda-object concern that Todd raised.

          Show
          Henry Robinson added a comment - Patch with review comments addressed. I'm running a soak test overnight of TestParallelLocalRead with the 'mixed' workload; I'll report back on its success tomorrow. I'll also do some Java-side benchmarks to isolate the libhdfs overhead, and also benchmark the lambda-object concern that Todd raised.
          Hide
          Todd Lipcon added a comment -

          Thanks Dhruba. I opened HDFS-3053 for further discussion around zero-copy reads.

          Show
          Todd Lipcon added a comment - Thanks Dhruba. I opened HDFS-3053 for further discussion around zero-copy reads.
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > first pass looks good. A few questions:

          > - do you have before/after benchmarks for the old TestParallelRead with this patch? ie make sure this patch doesn't regress performance of the "normal" array-based read path?

          > - have you run the various randomized tests overnight or so to get good coverage of all the cases?

          >

          > I didn't look carefully at the libhdfs code, since you said offline there's another rev of that coming

          Thanks for the review! I'll get a new patch on the ticket and here momentarily.

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 123

          > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line123>

          >

          > can you add javadoc explaining this var (even though it's not new)?

          Done.

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 341

          > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line341>

          >

          > I'd move this inside the if statement below. Then in the second "phase", re-declare it under that scope, since the "len" variable doesn't ever carry over between phases (and in fact becomes invalid)

          Good catch, done.

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 392-396

          > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line392>

          >

          > this code's very similar to line 358-362. Can we extract a utility function which we could optimize as I mentioned above to avoid object allocation?

          Done.

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 358

          > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line358>

          >

          > Seems like this area could be optimized a bit to avoid the object creation here...

          >

          > In one case, slowReadBuff.remaining() <= len, in which case you can directly do buf.put(slowReadBuff).

          >

          > In the other case, you could probably save off the limit of slowReadBuff, lower the limit to len, do the read, then set the limit again.

          Done - I didn't bother making the special case, but have avoided the slice. I doubt the limit manipulation is costly.

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java, line 496

          > <https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line496>

          >

          > remove commented out code

          Done

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 412-413

          > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line412>

          >

          > I think it might simplify things if you used this function only for the case where verifyChecksum == true. Then above, the non-checksummed read case would just call fillBuffer directly. What do you think?

          >

          > My reasoning is that this function is called 'readChunksIntoByteBuffer' implying that it always reads a multiple of checksum chunk size, which isn't really a requirement when not verifying checksums

          I take your point. My only reservation is that there's some logic that we'd have to duplicate around the handling of offsetFromChunkBoundary and updating the buffer position on a successful read. Maybe renaming the method is the easiest way to avoid confusion? Maybe doByteBufferRead or something equally vague...

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 539-540

          > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line539>

          >

          > maybe TRACE is more appropriate here. Also LOG.trace() inside

          Done on both counts.

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c, line 826

          > <https://reviews.apache.org/r/4184/diff/1/?file=88336#file88336line826>

          >

          > goofy empty clause. Other goofy indentation here.

          Fixed. Whitespace is all over the shop in hdfs.c. I've made this method a uniform 2-space indent.

          On 2012-03-05 21:57:31, Todd Lipcon wrote:

          > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java, line 591

          > <https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line591>

          >

          > I think it's worth making these actual static inner classes with names instead of anonymous, for prettier stack traces.

          >

          > I remember we discussed a few weeks back whether the creation of these lambdas would harm performance for short reads. Do you have any results for TestParallelRead performance for the lambda-based approach here vs the uglier approach

          I've made them static. I'll measure the performance with and without, and report on the jira.

          • Henry

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4184/#review5622
          -----------------------------------------------------------

          On 2012-03-05 21:00:55, Todd Lipcon wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4184/

          -----------------------------------------------------------

          (Updated 2012-03-05 21:00:55)

          Review request for hadoop-hdfs and Todd Lipcon.

          Summary

          -------

          Posting patch to RB on behalf of Henry.

          This addresses bug HDFS-2834.

          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs

          -----

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8

          Diff: https://reviews.apache.org/r/4184/diff

          Testing

          -------

          Thanks,

          Todd

          Show
          jiraposter@reviews.apache.org added a comment - On 2012-03-05 21:57:31, Todd Lipcon wrote: > first pass looks good. A few questions: > - do you have before/after benchmarks for the old TestParallelRead with this patch? ie make sure this patch doesn't regress performance of the "normal" array-based read path? > - have you run the various randomized tests overnight or so to get good coverage of all the cases? > > I didn't look carefully at the libhdfs code, since you said offline there's another rev of that coming Thanks for the review! I'll get a new patch on the ticket and here momentarily. On 2012-03-05 21:57:31, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 123 > < https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line123 > > > can you add javadoc explaining this var (even though it's not new)? Done. On 2012-03-05 21:57:31, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 341 > < https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line341 > > > I'd move this inside the if statement below. Then in the second "phase", re-declare it under that scope, since the "len" variable doesn't ever carry over between phases (and in fact becomes invalid) Good catch, done. On 2012-03-05 21:57:31, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 392-396 > < https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line392 > > > this code's very similar to line 358-362. Can we extract a utility function which we could optimize as I mentioned above to avoid object allocation? Done. On 2012-03-05 21:57:31, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 358 > < https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line358 > > > Seems like this area could be optimized a bit to avoid the object creation here... > > In one case, slowReadBuff.remaining() <= len, in which case you can directly do buf.put(slowReadBuff). > > In the other case, you could probably save off the limit of slowReadBuff, lower the limit to len, do the read, then set the limit again. Done - I didn't bother making the special case, but have avoided the slice. I doubt the limit manipulation is costly. On 2012-03-05 21:57:31, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java, line 496 > < https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line496 > > > remove commented out code Done On 2012-03-05 21:57:31, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 412-413 > < https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line412 > > > I think it might simplify things if you used this function only for the case where verifyChecksum == true. Then above, the non-checksummed read case would just call fillBuffer directly. What do you think? > > My reasoning is that this function is called 'readChunksIntoByteBuffer' implying that it always reads a multiple of checksum chunk size, which isn't really a requirement when not verifying checksums I take your point. My only reservation is that there's some logic that we'd have to duplicate around the handling of offsetFromChunkBoundary and updating the buffer position on a successful read. Maybe renaming the method is the easiest way to avoid confusion? Maybe doByteBufferRead or something equally vague... On 2012-03-05 21:57:31, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 539-540 > < https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line539 > > > maybe TRACE is more appropriate here. Also LOG.trace() inside Done on both counts. On 2012-03-05 21:57:31, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c, line 826 > < https://reviews.apache.org/r/4184/diff/1/?file=88336#file88336line826 > > > goofy empty clause. Other goofy indentation here. Fixed. Whitespace is all over the shop in hdfs.c. I've made this method a uniform 2-space indent. On 2012-03-05 21:57:31, Todd Lipcon wrote: > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java, line 591 > < https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line591 > > > I think it's worth making these actual static inner classes with names instead of anonymous, for prettier stack traces. > > I remember we discussed a few weeks back whether the creation of these lambdas would harm performance for short reads. Do you have any results for TestParallelRead performance for the lambda-based approach here vs the uglier approach I've made them static. I'll measure the performance with and without, and report on the jira. Henry ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4184/#review5622 ----------------------------------------------------------- On 2012-03-05 21:00:55, Todd Lipcon wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4184/ ----------------------------------------------------------- (Updated 2012-03-05 21:00:55) Review request for hadoop-hdfs and Todd Lipcon. Summary ------- Posting patch to RB on behalf of Henry. This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs ----- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4184/diff Testing ------- Thanks, Todd
          Hide
          dhruba borthakur added a comment -

          Agreed Todd. Instead of subverting this jira, maybe we can open a new jira and discus how we can get scattergather into the FSDataInoutStream API.

          Also, it is better to do the mmap only once for the entire file instead of mmaping only the relevant offsets on demand. I will open a JIRA

          Show
          dhruba borthakur added a comment - Agreed Todd. Instead of subverting this jira, maybe we can open a new jira and discus how we can get scattergather into the FSDataInoutStream API. Also, it is better to do the mmap only once for the entire file instead of mmaping only the relevant offsets on demand. I will open a JIRA
          Hide
          Todd Lipcon added a comment -

          Hi Dhruba. Looks like your implementation provides for "zero copy" reads, since you can return mmapped memory directly. This is a neat trick, but a little different from this JIRA, where it still copies (just once) into a user-provided buffer. Also, looks like your implementation doesn't do any checksumming for this API, right? Other concerns with the mmap approach are:

          • it doesn't actually unmap when it goes "out of scope" as your comment indicates – you need to wait on an actual GC to call the finalizers, which can cause the process to run out of address space on a 32-bit JVM if there isn't actual pressure on the Java heap
          • the mmap() call will cause a TLB shootdown across all the threads - I'd be surprised if the API is actually faster for the case of smaller multi-threaded reads. See section 5.1 of this paper for more info: http://www.scribd.com/doc/59150636/C4-Continuously-Concurrent-Compacting-Collector
          Show
          Todd Lipcon added a comment - Hi Dhruba. Looks like your implementation provides for "zero copy" reads, since you can return mmapped memory directly. This is a neat trick, but a little different from this JIRA, where it still copies (just once) into a user-provided buffer. Also, looks like your implementation doesn't do any checksumming for this API, right? Other concerns with the mmap approach are: it doesn't actually unmap when it goes "out of scope" as your comment indicates – you need to wait on an actual GC to call the finalizers, which can cause the process to run out of address space on a 32-bit JVM if there isn't actual pressure on the Java heap the mmap() call will cause a TLB shootdown across all the threads - I'd be surprised if the API is actually faster for the case of smaller multi-threaded reads. See section 5.1 of this paper for more info: http://www.scribd.com/doc/59150636/C4-Continuously-Concurrent-Compacting-Collector
          Hide
          dhruba borthakur added a comment -

          In our own variant of hdfs, we have introduced a scatter-gather api in FSDataInputStream:

          
            public List<ByteBuffer> readFullyScatterGather(long position, int length)
              throws IOException {
              return ((PositionedReadable)in).readFullyScatterGather(position, length);
            }
          
          

          This allows HDFS to return ByteBuffers all the way to the application. The above api, in turn, invokes DFSClient.InputStream:

          
              /**
               * Read bytes starting from the specified position. This is optimized
               * for fast preads from an application with minimum of buffer copies.
               *
               * @param position start read from this position
               * @param length number of bytes to read
               *
               * @return A list of Byte Buffers that represent all the data that was
               * read from the underlying system.
               */
              @Override
              public List<ByteBuffer> readFullyScatterGather(long position, int length)
                throws IOException {
          
          

          Some details here: http://bit.ly/zyDF0h

          Show
          dhruba borthakur added a comment - In our own variant of hdfs, we have introduced a scatter-gather api in FSDataInputStream: public List<ByteBuffer> readFullyScatterGather( long position, int length) throws IOException { return ((PositionedReadable)in).readFullyScatterGather(position, length); } This allows HDFS to return ByteBuffers all the way to the application. The above api, in turn, invokes DFSClient.InputStream: /** * Read bytes starting from the specified position. This is optimized * for fast preads from an application with minimum of buffer copies. * * @param position start read from this position * @param length number of bytes to read * * @ return A list of Byte Buffers that represent all the data that was * read from the underlying system. */ @Override public List<ByteBuffer> readFullyScatterGather( long position, int length) throws IOException { Some details here: http://bit.ly/zyDF0h
          Hide
          Henry Robinson added a comment -

          Thanks for the review! Per your first two questions:

          • There's no significant difference in my benchmarks with the old copying path doing the same experiment:
              Native Checksums No Checksums Non-native Checksums Remote, Native Checksums
            Copying (MB/s) - 32k buffer and request size 2010.21 2290.50 721.52 1412.20
            Old copying path - 32k buffer and request size 2087.43 2232.67 708.67 1365.60
          • I've run the modified TestParallelRead tests for a couple of hours, but I plan to do a soak test overnight with the full suite before this gets committed.
          Show
          Henry Robinson added a comment - Thanks for the review! Per your first two questions: There's no significant difference in my benchmarks with the old copying path doing the same experiment:   Native Checksums No Checksums Non-native Checksums Remote, Native Checksums Copying (MB/s) - 32k buffer and request size 2010.21 2290.50 721.52 1412.20 Old copying path - 32k buffer and request size 2087.43 2232.67 708.67 1365.60 I've run the modified TestParallelRead tests for a couple of hours, but I plan to do a soak test overnight with the full suite before this gets committed.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4184/#review5622
          -----------------------------------------------------------

          Ship it!

          first pass looks good. A few questions:

          • do you have before/after benchmarks for the old TestParallelRead with this patch? ie make sure this patch doesn't regress performance of the "normal" array-based read path?
          • have you run the various randomized tests overnight or so to get good coverage of all the cases?

          I didn't look carefully at the libhdfs code, since you said offline there's another rev of that coming

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4184/#comment12234>

          can you add javadoc explaining this var (even though it's not new)?

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4184/#comment12233>

          I'd move this inside the if statement below. Then in the second "phase", re-declare it under that scope, since the "len" variable doesn't ever carry over between phases (and in fact becomes invalid)

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4184/#comment12232>

          Seems like this area could be optimized a bit to avoid the object creation here...

          In one case, slowReadBuff.remaining() <= len, in which case you can directly do buf.put(slowReadBuff).

          In the other case, you could probably save off the limit of slowReadBuff, lower the limit to len, do the read, then set the limit again.

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4184/#comment12235>

          this code's very similar to line 358-362. Can we extract a utility function which we could optimize as I mentioned above to avoid object allocation?

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4184/#comment12236>

          I think it might simplify things if you used this function only for the case where verifyChecksum == true. Then above, the non-checksummed read case would just call fillBuffer directly. What do you think?

          My reasoning is that this function is called 'readChunksIntoByteBuffer' implying that it always reads a multiple of checksum chunk size, which isn't really a requirement when not verifying checksums

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
          <https://reviews.apache.org/r/4184/#comment12237>

          maybe TRACE is more appropriate here. Also LOG.trace() inside

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
          <https://reviews.apache.org/r/4184/#comment12238>

          remove commented out code

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
          <https://reviews.apache.org/r/4184/#comment12239>

          I think it's worth making these actual static inner classes with names instead of anonymous, for prettier stack traces.

          I remember we discussed a few weeks back whether the creation of these lambdas would harm performance for short reads. Do you have any results for TestParallelRead performance for the lambda-based approach here vs the uglier approach

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c
          <https://reviews.apache.org/r/4184/#comment12240>

          goofy empty clause. Other goofy indentation here.

          • Todd

          On 2012-03-05 21:00:55, Todd Lipcon wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/4184/

          -----------------------------------------------------------

          (Updated 2012-03-05 21:00:55)

          Review request for hadoop-hdfs and Todd Lipcon.

          Summary

          -------

          Posting patch to RB on behalf of Henry.

          This addresses bug HDFS-2834.

          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs

          -----

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4

          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5

          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION

          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8

          Diff: https://reviews.apache.org/r/4184/diff

          Testing

          -------

          Thanks,

          Todd

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4184/#review5622 ----------------------------------------------------------- Ship it! first pass looks good. A few questions: do you have before/after benchmarks for the old TestParallelRead with this patch? ie make sure this patch doesn't regress performance of the "normal" array-based read path? have you run the various randomized tests overnight or so to get good coverage of all the cases? I didn't look carefully at the libhdfs code, since you said offline there's another rev of that coming hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4184/#comment12234 > can you add javadoc explaining this var (even though it's not new)? hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4184/#comment12233 > I'd move this inside the if statement below. Then in the second "phase", re-declare it under that scope, since the "len" variable doesn't ever carry over between phases (and in fact becomes invalid) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4184/#comment12232 > Seems like this area could be optimized a bit to avoid the object creation here... In one case, slowReadBuff.remaining() <= len, in which case you can directly do buf.put(slowReadBuff). In the other case, you could probably save off the limit of slowReadBuff, lower the limit to len, do the read, then set the limit again. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4184/#comment12235 > this code's very similar to line 358-362. Can we extract a utility function which we could optimize as I mentioned above to avoid object allocation? hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4184/#comment12236 > I think it might simplify things if you used this function only for the case where verifyChecksum == true. Then above, the non-checksummed read case would just call fillBuffer directly. What do you think? My reasoning is that this function is called 'readChunksIntoByteBuffer' implying that it always reads a multiple of checksum chunk size, which isn't really a requirement when not verifying checksums hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java < https://reviews.apache.org/r/4184/#comment12237 > maybe TRACE is more appropriate here. Also LOG.trace() inside hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java < https://reviews.apache.org/r/4184/#comment12238 > remove commented out code hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java < https://reviews.apache.org/r/4184/#comment12239 > I think it's worth making these actual static inner classes with names instead of anonymous, for prettier stack traces. I remember we discussed a few weeks back whether the creation of these lambdas would harm performance for short reads. Do you have any results for TestParallelRead performance for the lambda-based approach here vs the uglier approach hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c < https://reviews.apache.org/r/4184/#comment12240 > goofy empty clause. Other goofy indentation here. Todd On 2012-03-05 21:00:55, Todd Lipcon wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4184/ ----------------------------------------------------------- (Updated 2012-03-05 21:00:55) Review request for hadoop-hdfs and Todd Lipcon. Summary ------- Posting patch to RB on behalf of Henry. This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs ----- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4184/diff Testing ------- Thanks, Todd
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/4184/
          -----------------------------------------------------------

          Review request for hadoop-hdfs and Todd Lipcon.

          Summary
          -------

          Posting patch to RB on behalf of Henry.

          This addresses bug HDFS-2834.
          http://issues.apache.org/jira/browse/HDFS-2834

          Diffs


          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4
          hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777
          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5
          hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION
          hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8

          Diff: https://reviews.apache.org/r/4184/diff

          Testing
          -------

          Thanks,

          Todd

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4184/ ----------------------------------------------------------- Review request for hadoop-hdfs and Todd Lipcon. Summary ------- Posting patch to RB on behalf of Henry. This addresses bug HDFS-2834 . http://issues.apache.org/jira/browse/HDFS-2834 Diffs hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4184/diff Testing ------- Thanks, Todd
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517054/hdfs-2834-libhdfs-benchmark.png
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1949//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517054/hdfs-2834-libhdfs-benchmark.png against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1949//console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          Here are some initial benchmark numbers. They need a little explaining.

          I ran 16 experiments total, changing the read path (copying or direct), the kind of checksum used (native, non-native or none), the locality (shortcircuit or remote-to-same-machine), although not in all combinations. All measurements are through libhdfs, which will explain a couple of oddities in the performance. Once I get a little time, I'll try and do a native Java benchmark, but the relative results should be quite similar.

          Configuration

          The test reads the first 512MB of a 2GB file from a MiniDFSCluster running on the same machine. Each configuration was run 50 times, and the first 5 runs were discarded; the remaining runs were averaged. The file was read from buffer cache on a machine with 16GB RAM and 8 i7 cores. You can see the code here: https://gist.github.com/1977470

          Read sizes

          The size of each read requested is an important variable. When performing a checksum, the maximum read size is bounded by the number of checksums that BlockReadLocal can fit into its internal buffer. Prior to this patch, this fixed the maximum read size in one go to 32k. In a revision of this patch which I'll upload shortly, I've made this buffer size configurable.

          I ran all the experiments in two configurations - with a 32k read buffer, and a 1MB one. The size of the requested read was also fixed to 32k and 1MB respectively. When not performing checksums, but doing a shortcircuit read, there is no limit on the size of a single read, but for comparison these experiments were run with 32k and 1MB reads as well.

          Finally, remote reads are limited to 64k in size. Again, I ran the experiment with both read sizes. The 1MB / copying read performance is extremely slow when performing a remote read. This is because of the excessive amount of memory allocation happening inside libhdfs which will allocate a 1MB byte[] for each 64k read. This illustrates one of the confounding effects of measuring performance through libhdfs, and the dangers of not correctly matching your read size to the size of the read the BlockReader implementation is able to return.

          Results

          (All values are throughput measured in MB/s)

            Native Checksums No Checksums Non-native Checksums Remote, Native Checksums
          Direct (MB/s) - 1MB buffer and request size 3834.25 4665.05 867.06 2057.17
          Copying (MB/s) - 1MB buffer and request size 1976.09 1650.15 754.97 394.91
          Direct (MB/s) - 32k buffer and request size 2943.02 3695.37 816.22 1925.03
          Copying (MB/s) - 32k buffer and request size 2010.21 2290.50 721.52 1412.20

          ... and in pretty picture form:

          Show
          Henry Robinson added a comment - Here are some initial benchmark numbers. They need a little explaining. I ran 16 experiments total, changing the read path (copying or direct), the kind of checksum used (native, non-native or none), the locality (shortcircuit or remote-to-same-machine), although not in all combinations. All measurements are through libhdfs, which will explain a couple of oddities in the performance. Once I get a little time, I'll try and do a native Java benchmark, but the relative results should be quite similar. Configuration The test reads the first 512MB of a 2GB file from a MiniDFSCluster running on the same machine. Each configuration was run 50 times, and the first 5 runs were discarded; the remaining runs were averaged. The file was read from buffer cache on a machine with 16GB RAM and 8 i7 cores. You can see the code here: https://gist.github.com/1977470 Read sizes The size of each read requested is an important variable. When performing a checksum, the maximum read size is bounded by the number of checksums that BlockReadLocal can fit into its internal buffer. Prior to this patch, this fixed the maximum read size in one go to 32k. In a revision of this patch which I'll upload shortly, I've made this buffer size configurable. I ran all the experiments in two configurations - with a 32k read buffer, and a 1MB one. The size of the requested read was also fixed to 32k and 1MB respectively. When not performing checksums, but doing a shortcircuit read, there is no limit on the size of a single read, but for comparison these experiments were run with 32k and 1MB reads as well. Finally, remote reads are limited to 64k in size. Again, I ran the experiment with both read sizes. The 1MB / copying read performance is extremely slow when performing a remote read. This is because of the excessive amount of memory allocation happening inside libhdfs which will allocate a 1MB byte[] for each 64k read. This illustrates one of the confounding effects of measuring performance through libhdfs, and the dangers of not correctly matching your read size to the size of the read the BlockReader implementation is able to return. Results (All values are throughput measured in MB/s)   Native Checksums No Checksums Non-native Checksums Remote, Native Checksums Direct (MB/s) - 1MB buffer and request size 3834.25 4665.05 867.06 2057.17 Copying (MB/s) - 1MB buffer and request size 1976.09 1650.15 754.97 394.91 Direct (MB/s) - 32k buffer and request size 2943.02 3695.37 816.22 1925.03 Copying (MB/s) - 32k buffer and request size 2010.21 2290.50 721.52 1412.20 ... and in pretty picture form:
          Hide
          Todd Lipcon added a comment -

          Hey Henry. Thanks for getting test-patch clean. I'll look at the patch in detail this week.
          Meanwhile, some benchmark numbers would be great.

          Show
          Todd Lipcon added a comment - Hey Henry. Thanks for getting test-patch clean. I'll look at the patch in detail this week. Meanwhile, some benchmark numbers would be great.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516962/HDFS-2834.6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1946//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1946//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516962/HDFS-2834.6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1946//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1946//console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          Patch with findbugs warnings hopefully fixed.

          Show
          Henry Robinson added a comment - Patch with findbugs warnings hopefully fixed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516959/HDFS-2834.5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1945//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1945//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1945//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516959/HDFS-2834.5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1945//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1945//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1945//console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          Turns out a conditional around an assert was accidentally removed. Tests were passing for me in Eclipse which was ignoring the assert. All tests pass from the cli for me now.

          Show
          Henry Robinson added a comment - Turns out a conditional around an assert was accidentally removed. Tests were passing for me in Eclipse which was ignoring the assert. All tests pass from the cli for me now.
          Hide
          Henry Robinson added a comment -

          I'll take a look at the test failures, which are legit. Possible something got broken in the HA merge.

          Show
          Henry Robinson added a comment - I'll take a look at the test failures, which are legit. Possible something got broken in the HA merge.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516932/HDFS-2834.4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hdfs.TestShortCircuitLocalRead
          org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1944//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1944//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1944//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516932/HDFS-2834.4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestShortCircuitLocalRead org.apache.hadoop.hdfs.server.namenode.TestValidateConfigurationSettings +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1944//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/1944//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1944//console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          Resubmitting the patch now that HADOOP-8135 has been committed, so that test-patch can do its thing.

          Show
          Henry Robinson added a comment - Resubmitting the patch now that HADOOP-8135 has been committed, so that test-patch can do its thing.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516919/HDFS-2834.3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1943//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516919/HDFS-2834.3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1943//console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          Here's a patch rebased after the HA merge.

          Show
          Henry Robinson added a comment - Here's a patch rebased after the HA merge.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516893/HDFS-2834-no-common.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed the unit tests build

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1941//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1941//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516893/HDFS-2834-no-common.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed the unit tests build +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1941//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1941//console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          Thanks Aaron! Should have known to keep common and hdfs patches separate.

          I've put the common-specific changes (the ByteBufferReadable interface) on HADOOP-8135. This new patch, therefore, won't compile on Jenkins without HADOOP-8135 being committed, so it would be great if reviewers could consider them together.

          Show
          Henry Robinson added a comment - Thanks Aaron! Should have known to keep common and hdfs patches separate. I've put the common-specific changes (the ByteBufferReadable interface) on HADOOP-8135 . This new patch, therefore, won't compile on Jenkins without HADOOP-8135 being committed, so it would be great if reviewers could consider them together.
          Hide
          Aaron T. Myers added a comment -

          It's because the patch includes changes across hadoop-common-project and hadoop-hdfs-project. test-patch can't handle cross-project patches.

          Show
          Aaron T. Myers added a comment - It's because the patch includes changes across hadoop-common-project and hadoop-hdfs-project. test-patch can't handle cross-project patches.
          Hide
          Henry Robinson added a comment -

          Weird, patch applies with p0 for me and a colleague against trunk. I'll try and figure out what's going on here.

          Show
          Henry Robinson added a comment - Weird, patch applies with p0 for me and a colleague against trunk. I'll try and figure out what's going on here.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516767/HDFS-2834.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1940//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516767/HDFS-2834.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1940//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516767/HDFS-2834.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1939//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516767/HDFS-2834.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1939//console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          Patch without the duplicate ByteBufferReadable interface, and with less @Deprecated addition-spam.

          Show
          Henry Robinson added a comment - Patch without the duplicate ByteBufferReadable interface, and with less @Deprecated addition-spam.
          Hide
          Henry Robinson added a comment -

          Here's a patch against trunk for this ticket. Tests (including new ones) pass for me locally.

          What I've done here is:

          a) add a 'ByteBufferReadable' interface with a single read(ByteBuffer) API, and made FSDataInputStream implement it
          b) Implemented this interface in BlockReaderLocal (this is the most invasive change)
          c) Implemented the interface in RemoteBlockReader2 (same number of copies as byte[] path), and stubbed it out in RemoteBlockeReader
          d) Added tests to TestShortCircuitLocalRead to exercise the BlockReaderLocal path
          e) Split TestsParallelRead into a driver class plus a couple of test suites to exercise remote or local rads
          f) Added support to libhdfs for the direct read path.

          I'll circle back with benchmark numbers when I'm able (the ones that I've already run look promising).

          Show
          Henry Robinson added a comment - Here's a patch against trunk for this ticket. Tests (including new ones) pass for me locally. What I've done here is: a) add a 'ByteBufferReadable' interface with a single read(ByteBuffer) API, and made FSDataInputStream implement it b) Implemented this interface in BlockReaderLocal (this is the most invasive change) c) Implemented the interface in RemoteBlockReader2 (same number of copies as byte[] path), and stubbed it out in RemoteBlockeReader d) Added tests to TestShortCircuitLocalRead to exercise the BlockReaderLocal path e) Split TestsParallelRead into a driver class plus a couple of test suites to exercise remote or local rads f) Added support to libhdfs for the direct read path. I'll circle back with benchmark numbers when I'm able (the ones that I've already run look promising).
          Hide
          Todd Lipcon added a comment -

          This is also useful whenever a native decompression codec is being used. In those cases, we generally have the following copies:

          1) Socket -> DirectByteBuffer (in SocketChannel implementation)
          2) DirectByteBuffer -> byte[] (in SocketInputStream)
          3) byte[] -> Native buffer (set up for decompression)
          4*) decompression to a different native buffer (not really a copy - decompression necessarily rewrites)
          5) native buffer -> byte[]

          with the proposed improvement we can hopefully eliminate #2,#3 for all applications, and #2,#3,and #5 for libhdfs.

          Show
          Todd Lipcon added a comment - This is also useful whenever a native decompression codec is being used. In those cases, we generally have the following copies: 1) Socket -> DirectByteBuffer (in SocketChannel implementation) 2) DirectByteBuffer -> byte[] (in SocketInputStream) 3) byte[] -> Native buffer (set up for decompression) 4*) decompression to a different native buffer (not really a copy - decompression necessarily rewrites) 5) native buffer -> byte[] with the proposed improvement we can hopefully eliminate #2,#3 for all applications, and #2,#3,and #5 for libhdfs.

            People

            • Assignee:
              Henry Robinson
              Reporter:
              Henry Robinson
            • Votes:
              1 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development