Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7579

Improve log reporting during block report rpc failure

    Details

    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      During block reporting, if the block report RPC fails, for example because it exceeded the max rpc len, we should still produce some sort of LOG.info output to help with debugging.

      1. HDFS-7579.001.patch
        5 kB
        Charles Lamb
      2. HDFS-7579.000.patch
        5 kB
        Charles Lamb

        Issue Links

          Activity

          Hide
          clamb Charles Lamb added a comment -

          The attached patch modifies BPServiceActor so that even if one or more of the block report rpcs fails, a LOG.info message will still be displayed. This will help diagnose cases where the RPC throws an exception. This patch also adds a toString() to ServerCommand so that the LOG.info message displays something reasonable for the commands that it received back rather than just Object.toString().

          Show
          clamb Charles Lamb added a comment - The attached patch modifies BPServiceActor so that even if one or more of the block report rpcs fails, a LOG.info message will still be displayed. This will help diagnose cases where the RPC throws an exception. This patch also adds a toString() to ServerCommand so that the LOG.info message displays something reasonable for the commands that it received back rather than just Object.toString().
          Hide
          clamb Charles Lamb added a comment -

          Also, since this is just a remodularization of the LOG.info call, I did not add a unit test.

          Show
          clamb Charles Lamb added a comment - Also, since this is just a remodularization of the LOG.info call, I did not add a unit test.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12690360/HDFS-7579.000.patch
          against trunk revision 4cd66f7.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.qjournal.TestNNWithQJM

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9148//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9148//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690360/HDFS-7579.000.patch against trunk revision 4cd66f7. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.qjournal.TestNNWithQJM Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9148//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9148//console This message is automatically generated.
          Hide
          clamb Charles Lamb added a comment -

          The test failure is a timeout and appears to be unrelated. I reran it myself on my local machine with the patch applied and it passed.

          Show
          clamb Charles Lamb added a comment - The test failure is a timeout and appears to be unrelated. I reran it myself on my local machine with the patch applied and it passed.
          Hide
          cnauroth Chris Nauroth added a comment -

          Hi Charles Lamb. This is a good idea. Thanks for the patch.

          One difference I see in this patch is that the logged value of numReportsSent is set to the length of the per-volume report list on both the "single message" and "split RPC per report" code paths. Previously, this would have been 1 for the "single message" case. I think the intent of this counter was to give an indication of the actual number of RPCs, so we'd want to keep 1 for the "single message" case. Cc'ing Arpit Agarwal for a second opinion.

          Show
          cnauroth Chris Nauroth added a comment - Hi Charles Lamb . This is a good idea. Thanks for the patch. One difference I see in this patch is that the logged value of numReportsSent is set to the length of the per-volume report list on both the "single message" and "split RPC per report" code paths. Previously, this would have been 1 for the "single message" case. I think the intent of this counter was to give an indication of the actual number of RPCs, so we'd want to keep 1 for the "single message" case. Cc'ing Arpit Agarwal for a second opinion.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          Thank you for the heads up Chris.

          I agree that numReportsSent should be the number of RPCs and not reports.length. We could rename it to be clearer.

          Couple of additional comments:

          1. final int nCmds = cmds.size(); should check for cmds being null.
          2. The logged message should give some indication whether or not all RPCs succeeded. Perhaps we can log something like successfully completed x of y RPC calls. For the usual case (no split), y would be 1. For the split case it would be reports.length.
          Show
          arpitagarwal Arpit Agarwal added a comment - Thank you for the heads up Chris. I agree that numReportsSent should be the number of RPCs and not reports.length. We could rename it to be clearer. Couple of additional comments: final int nCmds = cmds.size(); should check for cmds being null. The logged message should give some indication whether or not all RPCs succeeded. Perhaps we can log something like successfully completed x of y RPC calls . For the usual case (no split), y would be 1. For the split case it would be reports.length .
          Hide
          clamb Charles Lamb added a comment -

          Chris Nauroth, Arpit Agarwal,

          Thanks for the review. I've uploaded a new patch which addresses your comments.

          An example message looks like this:

          2015-01-08 11:41:05,277 INFO datanode.DataNode (BPServiceActor.java:blockReport(525)) - Successfully sent 2 of 2 blockreports for 5 total blocks using 2 RPCs. This took 1 msec to generate and 7 msecs for RPC and NN processing. Got back 2 commands: FinalizeCommand/5; FinalizeCommand/5.

          Rather than just display the number of reports I now display the number of RPCs and the number of reports (as well the number of actual reports sent). Regarding success/non-success, it's funny you should mention that. My initial version of the patch did exactly that, but then I took it out before posting it since I figured it could be derived from the fact that not all the reports would have been sent. But being explicit is probably better so I have added it back in for the .001 version. cmds can not be null, but I added a final to its declaration & initial assignment anyway just to make sure. I think FB might have warned about an unnecessary null check if I were to add a check for nullness when assigning nCmds.

          Show
          clamb Charles Lamb added a comment - Chris Nauroth , Arpit Agarwal , Thanks for the review. I've uploaded a new patch which addresses your comments. An example message looks like this: 2015-01-08 11:41:05,277 INFO datanode.DataNode (BPServiceActor.java:blockReport(525)) - Successfully sent 2 of 2 blockreports for 5 total blocks using 2 RPCs. This took 1 msec to generate and 7 msecs for RPC and NN processing. Got back 2 commands: FinalizeCommand/5; FinalizeCommand/5. Rather than just display the number of reports I now display the number of RPCs and the number of reports (as well the number of actual reports sent). Regarding success/non-success, it's funny you should mention that. My initial version of the patch did exactly that, but then I took it out before posting it since I figured it could be derived from the fact that not all the reports would have been sent. But being explicit is probably better so I have added it back in for the .001 version. cmds can not be null, but I added a final to its declaration & initial assignment anyway just to make sure. I think FB might have warned about an unnecessary null check if I were to add a check for nullness when assigning nCmds.
          Hide
          arpitagarwal Arpit Agarwal added a comment -

          You are right - it cannot be null. I confused it with cmd.

          +1 for the .001 patch. I will hold off committing until the evening just in case Chris has comments.

          Thanks!

          Show
          arpitagarwal Arpit Agarwal added a comment - You are right - it cannot be null. I confused it with cmd. +1 for the .001 patch. I will hold off committing until the evening just in case Chris has comments. Thanks!
          Hide
          cnauroth Chris Nauroth added a comment -

          +1 from me too, pending Jenkins run on the .001 patch. Thanks for responding to the code review feedback, Charles.

          Show
          cnauroth Chris Nauroth added a comment - +1 from me too, pending Jenkins run on the .001 patch. Thanks for responding to the code review feedback, Charles.
          Hide
          clamb Charles Lamb added a comment -

          Thanks for the review Arpit and Chris. We'll see what Mr. Jenkins has to say.

          Show
          clamb Charles Lamb added a comment - Thanks for the review Arpit and Chris. We'll see what Mr. Jenkins has to say.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12690884/HDFS-7579.001.patch
          against trunk revision a6ed489.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. There were no new javadoc warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs:

          org.apache.hadoop.hdfs.TestDecommission

          Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9155//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9155//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12690884/HDFS-7579.001.patch against trunk revision a6ed489. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDecommission Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9155//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9155//console This message is automatically generated.
          Hide
          cnauroth Chris Nauroth added a comment -

          The test failure is unrelated. I have committed this to trunk and branch-2. Charles, thank you for contributing the patch. Arpit, thank you for help with the code review.

          Show
          cnauroth Chris Nauroth added a comment - The test failure is unrelated. I have committed this to trunk and branch-2. Charles, thank you for contributing the patch. Arpit, thank you for help with the code review.
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-trunk-Commit #6834 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6834/)
          HDFS-7579. Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #6834 (See https://builds.apache.org/job/Hadoop-trunk-Commit/6834/ ) HDFS-7579 . Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #68 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/68/)
          HDFS-7579. Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20)

          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #68 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/68/ ) HDFS-7579 . Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20) hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Yarn-trunk #802 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/802/)
          HDFS-7579. Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #802 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/802/ ) HDFS-7579 . Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #65 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/65/)
          HDFS-7579. Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #65 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/65/ ) HDFS-7579 . Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Hdfs-trunk #2000 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2000/)
          HDFS-7579. Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2000 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2000/ ) HDFS-7579 . Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #69 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/69/)
          HDFS-7579. Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #69 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/69/ ) HDFS-7579 . Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2019 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2019/)
          HDFS-7579. Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20)

          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java
          • hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
          • hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2019 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2019/ ) HDFS-7579 . Improve log reporting during block report rpc failure. Contributed by Charles Lamb. (cnauroth: rev 7e2d9a32426d04b5f08c2835f61882b053612a20) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/ServerCommand.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
          Hide
          vinodkv Vinod Kumar Vavilapalli added a comment -

          Sangjin Lee backported this to 2.6.1. I just pushed the commit after running compilation.

          Show
          vinodkv Vinod Kumar Vavilapalli added a comment - Sangjin Lee backported this to 2.6.1. I just pushed the commit after running compilation.

            People

            • Assignee:
              clamb Charles Lamb
              Reporter:
              clamb Charles Lamb
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development