Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2023

TestDFSIO read test may not read specified bytes.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.22.0
    • Component/s: benchmarks
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      TestDFSIO's read test may read less bytes than specified when reading large files.

      1. TestFsRead.java
        2 kB
        Hong Tang
      2. mr-2023-yahoo-hadoop-20.1xx.patch
        1 kB
        Hong Tang
      3. mr-2023-20100902.patch
        0.8 kB
        Hong Tang
      4. mr-2023-20100826.patch
        0.8 kB
        Hong Tang

        Issue Links

          Activity

          Hide
          Hong Tang added a comment -

          The problem is due to the following code segments:

            public static class ReadMapper extends IOStatMapper<Long> {
          
              public ReadMapper() { 
              }
          
              public Long doIO(Reporter reporter, 
                                 String name, 
                                 long totalSize // in bytes
                               ) throws IOException {
                // open file
                DataInputStream in = fs.open(new Path(getDataDir(getConf()), name));
                long actualSize = 0;
                try {
                  for(int curSize = bufferSize;
                          curSize == bufferSize && actualSize < totalSize;) { // <-- HERE
                    curSize = in.read(buffer, 0, bufferSize);
                    if(curSize < 0) break;
                    actualSize += curSize;
                    reporter.setStatus("reading " + name + "@" + 
                                       actualSize + "/" + totalSize 
                                       + " ::host = " + hostName);
                  }
                } finally {
                  in.close();
                }
                return Long.valueOf(actualSize);
              }
            }
          

          The problem is that the for-loop breaks out as soon as the previous read fails to fulfill the full buffer. The fix is pretty simple:

                  for(int curSize = bufferSize; actualSize < totalSize;) {
          
          Show
          Hong Tang added a comment - The problem is due to the following code segments: public static class ReadMapper extends IOStatMapper< Long > { public ReadMapper() { } public Long doIO(Reporter reporter, String name, long totalSize // in bytes ) throws IOException { // open file DataInputStream in = fs.open( new Path(getDataDir(getConf()), name)); long actualSize = 0; try { for ( int curSize = bufferSize; curSize == bufferSize && actualSize < totalSize;) { // <-- HERE curSize = in.read(buffer, 0, bufferSize); if (curSize < 0) break ; actualSize += curSize; reporter.setStatus( "reading " + name + "@" + actualSize + "/" + totalSize + " ::host = " + hostName); } } finally { in.close(); } return Long .valueOf(actualSize); } } The problem is that the for-loop breaks out as soon as the previous read fails to fulfill the full buffer. The fix is pretty simple: for ( int curSize = bufferSize; actualSize < totalSize;) {
          Hide
          Hong Tang added a comment -

          To confirm that DFS indeed may return less than requested bytes even before reaching the end, I wrote a test program (attached), and the output of a sample run looks as follows:

          hadoop dfs -ls /user/gridperf/gridmix3/part-m-00332/segment-0
          Found 1 items
          -rw-rw-rw-   3 gridperf hdfs 1073741824 2010-08-20 08:22 /user/gridperf/gridmix3/part-m-00332/segment-0
          
          hadoop org.apache.hadoop.fs.TestFsRead /user/gridperf/gridmix3/part-m-00332/segment-0 1000000
          10995954 bytes read
          21199983 bytes read
          32106261 bytes read
          42209617 bytes read
          52456131 bytes read
          63551911 bytes read
          73836262 bytes read
          84369397 bytes read
          95182878 bytes read
          105047397 bytes read
          115740295 bytes read
          126323360 bytes read
          137166764 bytes read
          147066000 bytes read
          157744477 bytes read
          168319334 bytes read
          178856592 bytes read
          188884554 bytes read
          199324045 bytes read
          209995098 bytes read
          220916802 bytes read
          231218738 bytes read
          241772291 bytes read
          251883835 bytes read
          262306687 bytes read
          Fail to read a full buffer before reaching the end: pos=267640189, expected=994623, actual=795267.
          272862612 bytes read
          283737254 bytes read
          293851212 bytes read
          304525446 bytes read
          314766024 bytes read
          325604342 bytes read
          335604768 bytes read
          346475397 bytes read
          357311830 bytes read
          367574920 bytes read
          377834612 bytes read
          388029682 bytes read
          398728223 bytes read
          408966064 bytes read
          419626247 bytes read
          430260987 bytes read
          440440647 bytes read
          451030835 bytes read
          461808645 bytes read
          471996795 bytes read
          482529325 bytes read
          493106417 bytes read
          503960340 bytes read
          514155195 bytes read
          524460261 bytes read
          534955349 bytes read
          Fail to read a full buffer before reaching the end: pos=536250423, expected=999458, actual=620489.
          545734170 bytes read
          556326582 bytes read
          567046173 bytes read
          577480068 bytes read
          587338410 bytes read
          598115745 bytes read
          608759717 bytes read
          619418792 bytes read
          629597629 bytes read
          639906390 bytes read
          650264871 bytes read
          661414262 bytes read
          671205472 bytes read
          681856772 bytes read
          692394138 bytes read
          702803762 bytes read
          713182701 bytes read
          723720128 bytes read
          734531251 bytes read
          745188960 bytes read
          755814801 bytes read
          765670009 bytes read
          776047213 bytes read
          786592324 bytes read
          797786600 bytes read
          Fail to read a full buffer before reaching the end: pos=804788073, expected=613320, actual=518295.
          808158276 bytes read
          818373817 bytes read
          828549794 bytes read
          838915719 bytes read
          850189376 bytes read
          860102547 bytes read
          870902116 bytes read
          881206170 bytes read
          891441081 bytes read
          902119052 bytes read
          912394977 bytes read
          923010497 bytes read
          933330792 bytes read
          944216276 bytes read
          954226049 bytes read
          965371734 bytes read
          975663038 bytes read
          986215681 bytes read
          996274088 bytes read
          1006954729 bytes read
          1017375248 bytes read
          1027801749 bytes read
          1038384467 bytes read
          1049383853 bytes read
          1059662742 bytes read
          1070106760 bytes read
          
          Show
          Hong Tang added a comment - To confirm that DFS indeed may return less than requested bytes even before reaching the end, I wrote a test program (attached), and the output of a sample run looks as follows: hadoop dfs -ls /user/gridperf/gridmix3/part-m-00332/segment-0 Found 1 items -rw-rw-rw- 3 gridperf hdfs 1073741824 2010-08-20 08:22 /user/gridperf/gridmix3/part-m-00332/segment-0 hadoop org.apache.hadoop.fs.TestFsRead /user/gridperf/gridmix3/part-m-00332/segment-0 1000000 10995954 bytes read 21199983 bytes read 32106261 bytes read 42209617 bytes read 52456131 bytes read 63551911 bytes read 73836262 bytes read 84369397 bytes read 95182878 bytes read 105047397 bytes read 115740295 bytes read 126323360 bytes read 137166764 bytes read 147066000 bytes read 157744477 bytes read 168319334 bytes read 178856592 bytes read 188884554 bytes read 199324045 bytes read 209995098 bytes read 220916802 bytes read 231218738 bytes read 241772291 bytes read 251883835 bytes read 262306687 bytes read Fail to read a full buffer before reaching the end: pos=267640189, expected=994623, actual=795267. 272862612 bytes read 283737254 bytes read 293851212 bytes read 304525446 bytes read 314766024 bytes read 325604342 bytes read 335604768 bytes read 346475397 bytes read 357311830 bytes read 367574920 bytes read 377834612 bytes read 388029682 bytes read 398728223 bytes read 408966064 bytes read 419626247 bytes read 430260987 bytes read 440440647 bytes read 451030835 bytes read 461808645 bytes read 471996795 bytes read 482529325 bytes read 493106417 bytes read 503960340 bytes read 514155195 bytes read 524460261 bytes read 534955349 bytes read Fail to read a full buffer before reaching the end: pos=536250423, expected=999458, actual=620489. 545734170 bytes read 556326582 bytes read 567046173 bytes read 577480068 bytes read 587338410 bytes read 598115745 bytes read 608759717 bytes read 619418792 bytes read 629597629 bytes read 639906390 bytes read 650264871 bytes read 661414262 bytes read 671205472 bytes read 681856772 bytes read 692394138 bytes read 702803762 bytes read 713182701 bytes read 723720128 bytes read 734531251 bytes read 745188960 bytes read 755814801 bytes read 765670009 bytes read 776047213 bytes read 786592324 bytes read 797786600 bytes read Fail to read a full buffer before reaching the end: pos=804788073, expected=613320, actual=518295. 808158276 bytes read 818373817 bytes read 828549794 bytes read 838915719 bytes read 850189376 bytes read 860102547 bytes read 870902116 bytes read 881206170 bytes read 891441081 bytes read 902119052 bytes read 912394977 bytes read 923010497 bytes read 933330792 bytes read 944216276 bytes read 954226049 bytes read 965371734 bytes read 975663038 bytes read 986215681 bytes read 996274088 bytes read 1006954729 bytes read 1017375248 bytes read 1027801749 bytes read 1038384467 bytes read 1049383853 bytes read 1059662742 bytes read 1070106760 bytes read
          Hide
          Hong Tang added a comment -

          test program

          Show
          Hong Tang added a comment - test program
          Hide
          Hong Tang added a comment -

          trivial patch. for-loop is replaced with a while-loop

          Show
          Hong Tang added a comment - trivial patch. for-loop is replaced with a while-loop
          Hide
          Tsz Wo Nicholas Sze added a comment -

          +1 patch looks good.

          Show
          Tsz Wo Nicholas Sze added a comment - +1 patch looks good.
          Hide
          Hong Tang added a comment -

          The earlier patch was not generated with --no-prefix. Will upload a new one.

          Show
          Hong Tang added a comment - The earlier patch was not generated with --no-prefix. Will upload a new one.
          Hide
          Hong Tang added a comment -

          test-patch passed on my local machine:

               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
          Show
          Hong Tang added a comment - test-patch passed on my local machine: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I ran TestDFSIO with a similar patch earlier. Without the patch, the number of bytes read by TestDFSIO may be less than the one specified in the parameter. In such case, the reported Throughput and Average IO rate could be > 1 GB/second, which is obviously incorrectly.

          I have committed this. Thanks, Hong!

          Show
          Tsz Wo Nicholas Sze added a comment - I ran TestDFSIO with a similar patch earlier. Without the patch, the number of bytes read by TestDFSIO may be less than the one specified in the parameter. In such case, the reported Throughput and Average IO rate could be > 1 GB/second, which is obviously incorrectly. I have committed this. Thanks, Hong!
          Hide
          Hong Tang added a comment -

          patch for yahoo hadoop 20.1xx branch. Not to be committed.

          Show
          Hong Tang added a comment - patch for yahoo hadoop 20.1xx branch. Not to be committed.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/)

          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #523 (See https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/523/ )

            People

            • Assignee:
              Hong Tang
              Reporter:
              Hong Tang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development