Hadoop HDFS
  1. Hadoop HDFS
  2. HDFS-6614

shorten TestPread run time with a smaller retry timeout setting

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 2.5.0
    • Fix Version/s: 2.5.0
    • Component/s: test
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Just notice logs like this from TestPread:
      DFS chooseDataNode: got # 3 IOException, will wait for 9909.622860072854 msec
      so i tried to set a smaller retry window value.

      Before patch:
      T E S T S
      -------------------------------------------------------
      Running org.apache.hadoop.hdfs.TestPread
      Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 154.812 sec - in org.apache.hadoop.hdfs.TestPread

      After the change:
      T E S T S
      -------------------------------------------------------
      Running org.apache.hadoop.hdfs.TestPread
      Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 131.724 sec - in org.apache.hadoop.hdfs.TestPread

      1. HDFS-6614-addmium.txt
        1 kB
        Liang Xie
      2. HDFS-6614.txt
        0.7 kB
        Liang Xie

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1827 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1827/)

          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1827 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1827/ )
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/)
          HDFS-6614. Addendum patch to shorten TestPread run time with smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1608846)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1799 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1799/ ) HDFS-6614 . Addendum patch to shorten TestPread run time with smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1608846 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/608/)
          HDFS-6614. Addendum patch to shorten TestPread run time with smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1608846)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #608 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/608/ ) HDFS-6614 . Addendum patch to shorten TestPread run time with smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1608846 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Hide
          Chris Nauroth added a comment -

          +1 for the addendum patch. I committed this to trunk, branch-2 and branch-2.5. Thanks again!

          Show
          Chris Nauroth added a comment - +1 for the addendum patch. I committed this to trunk, branch-2 and branch-2.5. Thanks again!
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5842 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5842/)
          HDFS-6614. Addendum patch to shorten TestPread run time with smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1608846)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5842 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5842/ ) HDFS-6614 . Addendum patch to shorten TestPread run time with smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1608846 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Hide
          Liang Xie added a comment -

          Filed HDFS-6638, since this tuning tips could optimize other cases also. Chris Nauroth, mind taking a look at it if possible? thank you!

          Show
          Liang Xie added a comment - Filed HDFS-6638 , since this tuning tips could optimize other cases also. Chris Nauroth , mind taking a look at it if possible? thank you!
          Hide
          Liang Xie added a comment -

          Chris Nauroth you are right, i remade an addendum patch, removed the unnecessary DFSClient stuff. the reordering of Mockito just avoid the risk that closing stream failure, it's not very related with current jira though...

          Show
          Liang Xie added a comment - Chris Nauroth you are right, i remade an addendum patch, removed the unnecessary DFSClient stuff. the reordering of Mockito just avoid the risk that closing stream failure, it's not very related with current jira though...
          Hide
          Chris Nauroth added a comment -

          Thanks for the addendum patch. In my environments, I found that it was only necessary to set DFS_CLIENT_RETRY_WINDOW_BASE to 0 (one-line patch). The other changes for creating a new DFSClient and reordering the Mockito#reset call seem to be unnecessary. Can we remove those changes and commit just the one-line addendum patch, or are you finding that those changes are required in your environment?

          Did your Windows VM run on a SSD or similar fast storage ?

          Actually, it was the opposite. This is very slow virtualized storage on top of HDD, so I/O generally is slower for me when I'm working on my Windows VM. Another potential factor is timer resolution on Windows. I filed HDFS-6631 to track fixing this part separately.

          Show
          Chris Nauroth added a comment - Thanks for the addendum patch. In my environments, I found that it was only necessary to set DFS_CLIENT_RETRY_WINDOW_BASE to 0 (one-line patch). The other changes for creating a new DFSClient and reordering the Mockito#reset call seem to be unnecessary. Can we remove those changes and commit just the one-line addendum patch, or are you finding that those changes are required in your environment? Did your Windows VM run on a SSD or similar fast storage ? Actually, it was the opposite. This is very slow virtualized storage on top of HDD, so I/O generally is slower for me when I'm working on my Windows VM. Another potential factor is timer resolution on Windows. I filed HDFS-6631 to track fixing this part separately.
          Hide
          Liang Xie added a comment -

          Chris Nauroth, i found still had several non-zero waiting inside chooseDataNode from TestRead log:
          >grep "will wait for" hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/org.apache.hadoop.hdfs.TestPread-output.txt |grep -v "will wait for 0.0 msec"
          2014-07-04 17:27:49,259 WARN hdfs.DFSClient (DFSInputStream.java:chooseDataNode(905)) - DFS chooseDataNode: got # 1 IOException, will wait for 429.349844455576 msec.
          2014-07-04 17:27:49,692 WARN hdfs.DFSClient (DFSInputStream.java:chooseDataNode(905)) - DFS chooseDataNode: got # 2 IOException, will wait for 8001.653000168143 msec.
          2014-07-04 17:27:57,697 WARN hdfs.DFSClient (DFSInputStream.java:chooseDataNode(905)) - DFS chooseDataNode: got # 3 IOException, will wait for 6062.193740931271 msec.

          I uploaded an minor patch to remove those unnecessary waiting. After applied HDFS-6614-addmium.txt, the result:
          >grep "will wait for" hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/org.apache.hadoop.hdfs.TestPread-output.txt|wc -l
          11
          >grep "will wait for" hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/org.apache.hadoop.hdfs.TestPread-output.txt |grep -v "will wait for 0.0 msec"|wc -l
          0

          Show
          Liang Xie added a comment - Chris Nauroth , i found still had several non-zero waiting inside chooseDataNode from TestRead log: >grep "will wait for" hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/org.apache.hadoop.hdfs.TestPread-output.txt |grep -v "will wait for 0.0 msec" 2014-07-04 17:27:49,259 WARN hdfs.DFSClient (DFSInputStream.java:chooseDataNode(905)) - DFS chooseDataNode: got # 1 IOException, will wait for 429.349844455576 msec. 2014-07-04 17:27:49,692 WARN hdfs.DFSClient (DFSInputStream.java:chooseDataNode(905)) - DFS chooseDataNode: got # 2 IOException, will wait for 8001.653000168143 msec. 2014-07-04 17:27:57,697 WARN hdfs.DFSClient (DFSInputStream.java:chooseDataNode(905)) - DFS chooseDataNode: got # 3 IOException, will wait for 6062.193740931271 msec. I uploaded an minor patch to remove those unnecessary waiting. After applied HDFS-6614 -addmium.txt, the result: >grep "will wait for" hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/org.apache.hadoop.hdfs.TestPread-output.txt|wc -l 11 >grep "will wait for" hadoop-hdfs-project/hadoop-hdfs/target/surefire-reports/org.apache.hadoop.hdfs.TestPread-output.txt |grep -v "will wait for 0.0 msec"|wc -l 0
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk #1820 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1820/)
          HDFS-6614. shorten TestPread run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607447)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1820 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1820/ ) HDFS-6614 . shorten TestPread run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607447 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Hide
          Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk #1793 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1793/)
          HDFS-6614. shorten TestPread run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607447)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Show
          Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #1793 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1793/ ) HDFS-6614 . shorten TestPread run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607447 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #602 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/602/)
          HDFS-6614. shorten TestPread run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607447)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #602 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/602/ ) HDFS-6614 . shorten TestPread run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607447 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Hide
          Liang Xie added a comment -

          I did discover that the new testHedgedReadLoopTooManyTimes test is failing on my Windows VM on the assertion for the loop count

          Did your Windows VM run on a SSD or similar fast storage ?
          I think there's possibility that if "the second read + the second creating block reader" took less then 1ms, then our asserted loopTimeNum could be 2, not 3. Anyway, i assume changing "assertEquals(3, input.getHedgedReadOpsLoopNumForTesting());" with "assertTrue(input.getHedgedReadOpsLoopNumForTesting() <=3 && input.getHedgedReadOpsLoopNumForTesting() >=2);", should be better.

          Show
          Liang Xie added a comment - I did discover that the new testHedgedReadLoopTooManyTimes test is failing on my Windows VM on the assertion for the loop count Did your Windows VM run on a SSD or similar fast storage ? I think there's possibility that if "the second read + the second creating block reader" took less then 1ms, then our asserted loopTimeNum could be 2, not 3. Anyway, i assume changing "assertEquals(3, input.getHedgedReadOpsLoopNumForTesting());" with "assertTrue(input.getHedgedReadOpsLoopNumForTesting() <=3 && input.getHedgedReadOpsLoopNumForTesting() >=2);", should be better.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Hadoop-trunk-Commit #5809 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5809/)
          HDFS-6614. shorten TestPread run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607447)

          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #5809 (See https://builds.apache.org/job/Hadoop-trunk-Commit/5809/ ) HDFS-6614 . shorten TestPread run time with a smaller retry timeout setting. Contributed by Liang Xie. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607447 ) /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestPread.java
          Hide
          Chris Nauroth added a comment -

          I committed this to trunk and branch-2. Thanks for the contribution!

          Show
          Chris Nauroth added a comment - I committed this to trunk and branch-2. Thanks for the contribution!
          Hide
          Chris Nauroth added a comment -

          +1 for the patch. I ran this multiple times, and it passed. I'll commit this.

          BTW, I did discover that the new testHedgedReadLoopTooManyTimes test is failing on my Windows VM on the assertion for the loop count. I suspect this isn't a Windows problem, but rather a race condition still lingering in the code. Windows tends to schedule threads just differently enough that we see the effects of race conditions that weren't visible under the Linux scheduler. We can fix this as a separate issue.

          FAILURE! - in org.apache.hadoop.hdfs.TestPread
          testHedgedReadLoopTooManyTimes(org.apache.hadoop.hdfs.TestPread)  Time elapsed:
          0.918 sec  <<< FAILURE!
          java.lang.AssertionError: expected:<3> but was:<2>
                  at org.junit.Assert.fail(Assert.java:88)
                  at org.junit.Assert.failNotEquals(Assert.java:743)
                  at org.junit.Assert.assertEquals(Assert.java:118)
                  at org.junit.Assert.assertEquals(Assert.java:555)
                  at org.junit.Assert.assertEquals(Assert.java:542)
                  at org.apache.hadoop.hdfs.TestPread.testHedgedReadLoopTooManyTimes(TestPread.java:328)
          
          Show
          Chris Nauroth added a comment - +1 for the patch. I ran this multiple times, and it passed. I'll commit this. BTW, I did discover that the new testHedgedReadLoopTooManyTimes test is failing on my Windows VM on the assertion for the loop count. I suspect this isn't a Windows problem, but rather a race condition still lingering in the code. Windows tends to schedule threads just differently enough that we see the effects of race conditions that weren't visible under the Linux scheduler. We can fix this as a separate issue. FAILURE! - in org.apache.hadoop.hdfs.TestPread testHedgedReadLoopTooManyTimes(org.apache.hadoop.hdfs.TestPread) Time elapsed: 0.918 sec <<< FAILURE! java.lang.AssertionError: expected:<3> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.TestPread.testHedgedReadLoopTooManyTimes(TestPread.java:328)
          Hide
          Liang Xie added a comment -

          The failed case is not related with current change, filed HDFS-6617 to track it.

          Show
          Liang Xie added a comment - The failed case is not related with current change, filed HDFS-6617 to track it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12653333/HDFS-6614.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController

          The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication
          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7259//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7259//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12653333/HDFS-6614.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 1 new or modified test files. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDFSZKFailoverController The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7259//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7259//console This message is automatically generated.

            People

            • Assignee:
              Liang Xie
              Reporter:
              Liang Xie
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development