Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.2-alpha
    • Component/s: datanode
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      Similar to HDFS-3170 on the write side, we should improve the metrics that are generated on the DN for read latency. We should have separate metrics for the time spent in transferTo vs waitWritable so that it's easy to distinguish slow local disks from slow readers on the other end of the socket.

      1. hdfs-3343.patch
        5 kB
        Andrew Wang
      2. hdfs-3343-2.patch
        11 kB
        Andrew Wang
      3. hdfs-3343-3.patch
        10 kB
        Andrew Wang
      4. hdfs-3343-4.patch
        10 kB
        Andrew Wang

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1127 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/)
          HDFS-3343. Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1127 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1127/ ) HDFS-3343 . Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1094 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/)
          HDFS-3343. Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1094 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1094/ ) HDFS-3343 . Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2439 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2439/)
          HDFS-3343. Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928)

          Result = FAILURE
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2439 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2439/ ) HDFS-3343 . Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928) Result = FAILURE todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2490 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2490/)
          HDFS-3343. Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2490 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2490/ ) HDFS-3343 . Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2422 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2422/)
          HDFS-3343. Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928)

          Result = SUCCESS
          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
          • /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2422 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2422/ ) HDFS-3343 . Improve metrics for DN read latency. Contributed by Andrew Wang. (Revision 1356928) Result = SUCCESS todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1356928 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/SocketOutputStream.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
          Hide
          Todd Lipcon added a comment -

          Committed to branch-2 and trunk, since it's a low-risk improvement. Thanks a lot, Andrew.

          Show
          Todd Lipcon added a comment - Committed to branch-2 and trunk, since it's a low-risk improvement. Thanks a lot, Andrew.
          Hide
          Andrew Wang added a comment -

          Of the four listed failed unit tests, the first three are almost certainly unrelated and passed for me locally, and the 4th I think is just flaky. I tried it on trunk, and it still failed erratically.

          Show
          Andrew Wang added a comment - Of the four listed failed unit tests, the first three are almost certainly unrelated and passed for me locally, and the 4th I think is just flaky. I tried it on trunk, and it still failed erratically.
          Hide
          Todd Lipcon added a comment -

          Tests seem unrelated, so I'll commit this. Thanks, Andrew.

          Show
          Todd Lipcon added a comment - Tests seem unrelated, so I'll commit this. Thanks, Andrew.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12534487/hdfs-3343-4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 1 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.fs.viewfs.TestViewFsTrash
          org.apache.hadoop.ha.TestZKFailoverController
          org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
          org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2736//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2736//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12534487/hdfs-3343-4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.viewfs.TestViewFsTrash org.apache.hadoop.ha.TestZKFailoverController org.apache.hadoop.hdfs.server.datanode.TestBPOfferService org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2736//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2736//console This message is automatically generated.
          Hide
          Todd Lipcon added a comment -

          +1 pending Jenkins test-patch results

          Show
          Todd Lipcon added a comment - +1 pending Jenkins test-patch results
          Hide
          Todd Lipcon added a comment -

          Marking patch available so the jenkins bot runs.

          Show
          Todd Lipcon added a comment - Marking patch available so the jenkins bot runs.
          Hide
          Andrew Wang added a comment -

          Realized I was silly about the public thing, and just changed it to accessible through a getter.

          Show
          Andrew Wang added a comment - Realized I was silly about the public thing, and just changed it to accessible through a getter.
          Hide
          Andrew Wang added a comment -

          I removed the timing checks from the test, and some other dead code I missed.

          I think those two metrics need to be public since we pass them to SocketOutputStream, which then calls add() to update. The alternative is to make SOS.transferToFully() return the timing values, and then do the add() in BlockSender.

          Show
          Andrew Wang added a comment - I removed the timing checks from the test, and some other dead code I missed. I think those two metrics need to be public since we pass them to SocketOutputStream, which then calls add() to update. The alternative is to make SOS.transferToFully() return the timing values, and then do the add() in BlockSender.
          Hide
          Todd Lipcon added a comment -

          Looks pretty good. Small comments:

          • Do the new metrics have to be marked public in DataNodeMetrics.java?
          • In the test case, is it safe to assert >0 nanos? eg waitWritable might take "0" nanos on a system with insufficiently fine-grained clock. I think we can leave it at just counting the NumOps to avoid a potentially flaky test.
          Show
          Todd Lipcon added a comment - Looks pretty good. Small comments: Do the new metrics have to be marked public in DataNodeMetrics.java? In the test case, is it safe to assert >0 nanos? eg waitWritable might take "0" nanos on a system with insufficiently fine-grained clock. I think we can leave it at just counting the NumOps to avoid a potentially flaky test.
          Hide
          Andrew Wang added a comment -

          Made suggested changes, with a few differences.

          -Since transferTo both reads from disk and writes to the network, I called that metric "SendDataPacketTransferNanos". Other one is "SendDataPacketBlockedOnNetworkNanos".
          -Used System.nanoTime() for doing more accurate timing, so it's nano rather than microseconds.

          Show
          Andrew Wang added a comment - Made suggested changes, with a few differences. -Since transferTo both reads from disk and writes to the network, I called that metric "SendDataPacketTransferNanos". Other one is "SendDataPacketBlockedOnNetworkNanos". -Used System.nanoTime() for doing more accurate timing, so it's nano rather than microseconds.
          Hide
          Todd Lipcon added a comment -

          Few quick comments:

          • Now that I see again how complicated transferToFully is, I think I disagree with my earlier idea that we should copy-paste it. It seems like instead we should add an API to SocketOutputStream like:

          void transferToFully(FileChannel ch, int pos, int len, MutableCounterLong transferTime, MutableCounterLong waitTime)

          (and have the old call delegate to that and pass null for the metrics)

          • The new metrics in DataNode need better names (eg "readDataPacketFromDiskMillis" and "sendDataPacketToNetworkMillis" or something?), and I think they should be MutableRates instead of counters, right? ie you need to count the number of ops in addition to the sum time, or else the sum time is uninterpretable.
          • I think the counter increments should be summed inside the loop locally, and then only added to the metric at the end of each packet. Otherwise it will skew the averages
          • It seems like we can add a simple unit test (or just a new assertion to an existing test like TestPRead) that these counters have non-zero values.
          • Maybe the unit for these should be microseconds instead of milliseconds? Given a lot of reads should hit buffer cache, having more precision seems useful.
          Show
          Todd Lipcon added a comment - Few quick comments: Now that I see again how complicated transferToFully is, I think I disagree with my earlier idea that we should copy-paste it. It seems like instead we should add an API to SocketOutputStream like: void transferToFully(FileChannel ch, int pos, int len, MutableCounterLong transferTime, MutableCounterLong waitTime) (and have the old call delegate to that and pass null for the metrics) The new metrics in DataNode need better names (eg "readDataPacketFromDiskMillis" and "sendDataPacketToNetworkMillis" or something?), and I think they should be MutableRates instead of counters, right? ie you need to count the number of ops in addition to the sum time, or else the sum time is uninterpretable. I think the counter increments should be summed inside the loop locally, and then only added to the metric at the end of each packet. Otherwise it will skew the averages It seems like we can add a simple unit test (or just a new assertion to an existing test like TestPRead) that these counters have non-zero values. Maybe the unit for these should be microseconds instead of milliseconds? Given a lot of reads should hit buffer cache, having more precision seems useful.
          Hide
          Andrew Wang added a comment -

          Added the two metrics. No test cases, but I verified by starting up a 1-node cluster and doing some reads to make sure the counters incremented.

          Show
          Andrew Wang added a comment - Added the two metrics. No test cases, but I verified by starting up a 1-node cluster and doing some reads to make sure the counters incremented.
          Hide
          Todd Lipcon added a comment -

          I was thinking the latter, most likely.

          Show
          Todd Lipcon added a comment - I was thinking the latter, most likely.
          Hide
          Andrew Wang added a comment -

          Hmm, on closer examination, I realize that SocketOutputStream is in common, so you have to pull timing info out of the call to transferToFully. Thoughts? I could wrap transferToFully and return the timing info, or expose transferTo and have the DN do the transferToFully logic itself.

          Show
          Andrew Wang added a comment - Hmm, on closer examination, I realize that SocketOutputStream is in common, so you have to pull timing info out of the call to transferToFully . Thoughts? I could wrap transferToFully and return the timing info, or expose transferTo and have the DN do the transferToFully logic itself.
          Hide
          Andrew Wang added a comment -

          I'd like to take a hack at this. To confirm, we're talking about SocketOutputStream, and timing the transferTo and waitForWritable calls in transferToFully.

          Show
          Andrew Wang added a comment - I'd like to take a hack at this. To confirm, we're talking about SocketOutputStream, and timing the transferTo and waitForWritable calls in transferToFully .

            People

            • Assignee:
              Andrew Wang
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development