Hadoop Common
  1. Hadoop Common
  2. HADOOP-2342

create a micro-benchmark for measure local-file versus hdfs read

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.16.0
    • Component/s: None
    • Labels:
      None

      Description

      We should have a benchmark that measures reading a 10g file from hdfs and from local disk.

      1. 2342-1.patch
        9 kB
        Owen O'Malley
      2. throughput.patch
        8 kB
        Owen O'Malley

        Issue Links

          Activity

          Hide
          Owen O'Malley added a comment -

          The numbers I'm seeing for reads and writes on 10GB are:

          Raw Local FS write: 152 read: 79
          Local FS write: 203 read: 78
          mini-DFS write: 269 read: 134

          which suggests that writing checksums is pretty expensive and that HADOOP-2144 is reproducible.

          Show
          Owen O'Malley added a comment - The numbers I'm seeing for reads and writes on 10GB are: Raw Local FS write: 152 read: 79 Local FS write: 203 read: 78 mini-DFS write: 269 read: 134 which suggests that writing checksums is pretty expensive and that HADOOP-2144 is reproducible.
          Hide
          Raghu Angadi added a comment -

          Doesn't LocalFS do Checksums too? Read on Local FS is as fast as Raw Local.

          Show
          Raghu Angadi added a comment - Doesn't LocalFS do Checksums too? Read on Local FS is as fast as Raw Local.
          Hide
          Doug Cutting added a comment -

          > which suggests that writing checksums is pretty expensive

          But reading checksums does not seem to be too expensive, which is nice to see. However HDFS reads are much slower than local reads, which is worrisome. That seems to be the biggest outlier in your data: checksums add ~25%, while non-local reads adds ~90%.

          Show
          Doug Cutting added a comment - > which suggests that writing checksums is pretty expensive But reading checksums does not seem to be too expensive, which is nice to see. However HDFS reads are much slower than local reads, which is worrisome. That seems to be the biggest outlier in your data: checksums add ~25%, while non-local reads adds ~90%.
          Hide
          Owen O'Malley added a comment -

          This benchmark reads and writes files using java.io, RawLocalFileSystem, LocalFileSystem, and HDFS and reports the time.

          Show
          Owen O'Malley added a comment - This benchmark reads and writes files using java.io, RawLocalFileSystem, LocalFileSystem, and HDFS and reports the time.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12371063/throughput.patch
          against trunk revision r601491.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371063/throughput.patch against trunk revision r601491. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1274/console This message is automatically generated.
          Hide
          Owen O'Malley added a comment -

          I forgot to restart the time on the local read and write.

          Show
          Owen O'Malley added a comment - I forgot to restart the time on the local read and write.
          Hide
          Owen O'Malley added a comment -

          Need to be re-reviewed by QA.

          Show
          Owen O'Malley added a comment - Need to be re-reviewed by QA.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12371075/throughput.patch
          against trunk revision r601518.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/testReport/
          Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12371075/throughput.patch against trunk revision r601518. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1276/console This message is automatically generated.
          Hide
          Tom White added a comment -

          I built a distribution with this patch and got the following error. It looks like MiniDFSCluster and NameNode are loaded by different classloaders which causes the package access on createNameNode to fail.

          >bin/hadoop jar hadoop-0.16.0-dev-test.jar dfsthroughput
          Local = /tmp/hadoop-tom/mapred/temp
          Writing local time: 246
          Reading local time: 219
          Writing raw time: 225
          Reading raw time: 216
          Writing checked time: 219
          Reading checked time: 238
          java.lang.IllegalAccessError: tried to access method org.apache.hadoop.dfs.NameNode.createNameNode([Ljava/lang/String;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/dfs/NameNode; from class org.apache.hadoop.dfs.MiniDFSCluster
          at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:179)
          at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:118)
          at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:90)
          at org.apache.hadoop.dfs.BenchmarkThroughput.main(BenchmarkThroughput.java:190)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:585)
          at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
          at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
          at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:75)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:585)
          at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

          Show
          Tom White added a comment - I built a distribution with this patch and got the following error. It looks like MiniDFSCluster and NameNode are loaded by different classloaders which causes the package access on createNameNode to fail. >bin/hadoop jar hadoop-0.16.0-dev-test.jar dfsthroughput Local = /tmp/hadoop-tom/mapred/temp Writing local time: 246 Reading local time: 219 Writing raw time: 225 Reading raw time: 216 Writing checked time: 219 Reading checked time: 238 java.lang.IllegalAccessError: tried to access method org.apache.hadoop.dfs.NameNode.createNameNode([Ljava/lang/String;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/dfs/NameNode; from class org.apache.hadoop.dfs.MiniDFSCluster at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:179) at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:118) at org.apache.hadoop.dfs.MiniDFSCluster.<init>(MiniDFSCluster.java:90) at org.apache.hadoop.dfs.BenchmarkThroughput.main(BenchmarkThroughput.java:190) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:75) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
          Hide
          Tom White added a comment -

          I got the same error running the same distribution on Linux too. One fix would be to make the createNameNode method public, but I'd rather find another way if possible.

          Show
          Tom White added a comment - I got the same error running the same distribution on Linux too. One fix would be to make the createNameNode method public, but I'd rather find another way if possible.
          Hide
          Chris Douglas added a comment -

          I ran this on MacOS (jdk1.5.0_13), Linux (jdk1.5.0_08 and jdk1.6.0), and Windows (jdk1.6.0) using the latest trunk and cannot reproduce this case.

          Tom: Are you still seeing this issue?

          Show
          Chris Douglas added a comment - I ran this on MacOS (jdk1.5.0_13), Linux (jdk1.5.0_08 and jdk1.6.0), and Windows (jdk1.6.0) using the latest trunk and cannot reproduce this case. Tom: Are you still seeing this issue?
          Hide
          Chris Douglas added a comment -

          The patch no longer applies to trunk (AllTestDriver has new items).

          • Using ToolBase/ToolRunner would pick up its functionality, including:
            • It would be useful for the file size and buffer size to be configurable through the config/generic options
            • Noting the rep param in the usage might also be helpful
          Show
          Chris Douglas added a comment - The patch no longer applies to trunk (AllTestDriver has new items). Using ToolBase/ToolRunner would pick up its functionality, including: It would be useful for the file size and buffer size to be configurable through the config/generic options Noting the rep param in the usage might also be helpful
          Hide
          Owen O'Malley added a comment -

          Here is an update to trunk and switch to use Tool. Thanks, Chris!

          Show
          Owen O'Malley added a comment - Here is an update to trunk and switch to use Tool. Thanks, Chris!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12374004/2342-1.patch
          against trunk revision 614721.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new compiler warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests -1. The patch failed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1678/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1678/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1678/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1678/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12374004/2342-1.patch against trunk revision 614721. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1678/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1678/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1678/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1678/console This message is automatically generated.
          Hide
          Nigel Daley added a comment -

          Test failures were irrelevant to this patch. I just committed this. Thanks Owen!

          Show
          Nigel Daley added a comment - Test failures were irrelevant to this patch. I just committed this. Thanks Owen!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #380 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/380/ )

            People

            • Assignee:
              Owen O'Malley
              Reporter:
              Owen O'Malley
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development