Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11328

ZKFailoverController does not log Exception when doRun raises errors

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.1
    • Fix Version/s: 2.8.0, 3.0.0-alpha1
    • Component/s: ha
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      In ZKFailoverController.java, the Exception caught by the run() method does not have a single error log. This causes latent problems that are only manifested during failover.

      The problem we encountered

      An Exception is thrown from the doRun() method during initHM() (caused by a configuration error). If you want to repeat, you can set
      "ha.health-monitor.connect-retry-interval.ms" to be any nonsensical value.

      ZKFailoverController.java
        private int doRun(String[] args)
          ...
          initRPC();
          initHM();
          startRPC();
          ....
        }
      

      The Exception is caught in the run() method, as follows,

      ZKFailoverController.java
        public int run(final String[] args) throws Exception {
          ...
          try {
            ...
              @Override
              public Integer run() {
                try {
                  return doRun(args);
                } catch (Exception t) {
                  throw new RuntimeException(t);
                } finally {
                  if (elector != null) {
                    elector.terminateConnection();
                  }
                }
              }
            });
          } catch (RuntimeException rte) {
            throw (Exception)rte.getCause();
          }
        }
      

      Unfortunately, the Exception (causing the shutdown of the process) is not logged at all. This causes latent errors which is only manifested during failover (because ZKFC is dead). The tricky thing here is that everything looks perfectly fine: the jps command shows a running DFSZKFailoverController process and the two NameNode (active and standby) work fine.

      Patch

      We strongly suggest to add a error log to notify the error caught, such as,

      — hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java (revision 1641307)
      +++ hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java (working copy)

      @@ -178,6 +178,7 @@
               }
             });
           } catch (RuntimeException rte) {
      +      LOG.fatal("The failover controller encounters runtime error: " + rte);
             throw (Exception)rte.getCause();
           }
         }
      

      Thanks!

        Activity

        Hide
        tianyin Tianyin Xu added a comment -

        Patch that adds an error message for RuntimeException

        Show
        tianyin Tianyin Xu added a comment - Patch that adds an error message for RuntimeException
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12683258/ZKFailoverController.log.exception.1.patch
        against trunk revision a4df9ee.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5115//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5115//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12683258/ZKFailoverController.log.exception.1.patch against trunk revision a4df9ee. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5115//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5115//console This message is automatically generated.
        Hide
        schu Stephen Chu added a comment -

        Thanks, Tianyin. I agree the log will be helpful. +1 (non-binding)

        Show
        schu Stephen Chu added a comment - Thanks, Tianyin. I agree the log will be helpful. +1 (non-binding)
        Hide
        hadoopqa Hadoop QA added a comment -



        -1 overall



        Vote Subsystem Runtime Comment
        0 pre-patch 14m 37s Pre-patch trunk compilation is healthy.
        +1 @author 0m 0s The patch does not contain any @author tags.
        -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
        +1 javac 7m 31s There were no new javac warning messages.
        +1 javadoc 9m 32s There were no new javadoc warning messages.
        +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings.
        +1 checkstyle 1m 5s There were no new checkstyle issues.
        +1 whitespace 0m 0s The patch has no lines that end in whitespace.
        +1 install 1m 33s mvn install still works.
        +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse.
        +1 findbugs 1m 40s The patch does not introduce any new Findbugs (version 2.0.3) warnings.
        +1 common tests 23m 8s Tests passed in hadoop-common.
            60m 4s  



        Subsystem Report/Notes
        Patch URL http://issues.apache.org/jira/secure/attachment/12683258/ZKFailoverController.log.exception.1.patch
        Optional Tests javadoc javac unit findbugs checkstyle
        git revision trunk / e8d0ee5
        hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/6451/artifact/patchprocess/testrun_hadoop-common.txt
        Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/6451/testReport/
        Java 1.7.0_55
        uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/6451/console

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 pre-patch 14m 37s Pre-patch trunk compilation is healthy. +1 @author 0m 0s The patch does not contain any @author tags. -1 tests included 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac 7m 31s There were no new javac warning messages. +1 javadoc 9m 32s There were no new javadoc warning messages. +1 release audit 0m 22s The applied patch does not increase the total number of release audit warnings. +1 checkstyle 1m 5s There were no new checkstyle issues. +1 whitespace 0m 0s The patch has no lines that end in whitespace. +1 install 1m 33s mvn install still works. +1 eclipse:eclipse 0m 32s The patch built with eclipse:eclipse. +1 findbugs 1m 40s The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 common tests 23m 8s Tests passed in hadoop-common.     60m 4s   Subsystem Report/Notes Patch URL http://issues.apache.org/jira/secure/attachment/12683258/ZKFailoverController.log.exception.1.patch Optional Tests javadoc javac unit findbugs checkstyle git revision trunk / e8d0ee5 hadoop-common test log https://builds.apache.org/job/PreCommit-HADOOP-Build/6451/artifact/patchprocess/testrun_hadoop-common.txt Test Results https://builds.apache.org/job/PreCommit-HADOOP-Build/6451/testReport/ Java 1.7.0_55 uname Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Console output https://builds.apache.org/job/PreCommit-HADOOP-Build/6451/console This message was automatically generated.
        Hide
        ozawa Tsuyoshi Ozawa added a comment -

        +1

        Show
        ozawa Tsuyoshi Ozawa added a comment - +1
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #7722 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7722/)
        HADOOP-11328. ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd)

        • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
        • hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7722 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7722/ ) HADOOP-11328 . ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #183 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/183/)
        HADOOP-11328. ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd)

        • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
        • hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #183 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/183/ ) HADOOP-11328 . ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk #917 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/917/)
        HADOOP-11328. ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd)

        • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
        • hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk #917 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/917/ ) HADOOP-11328 . ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Hdfs-trunk #2115 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2115/)
        HADOOP-11328. ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd)

        • hadoop-common-project/hadoop-common/CHANGES.txt
        • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Hdfs-trunk #2115 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2115/ ) HADOOP-11328 . ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #174 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/174/)
        HADOOP-11328. ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd)

        • hadoop-common-project/hadoop-common/CHANGES.txt
        • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #174 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/174/ ) HADOOP-11328 . ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd) hadoop-common-project/hadoop-common/CHANGES.txt hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #184 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/184/)
        HADOOP-11328. ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd)

        • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
        • hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #184 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/184/ ) HADOOP-11328 . ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk #2133 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2133/)
        HADOOP-11328. ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd)

        • hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
        • hadoop-common-project/hadoop-common/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk #2133 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2133/ ) HADOOP-11328 . ZKFailoverController does not log Exception when doRun raises errors. Contributed by Tianyin Xu. (ozawa: rev bb9ddef0e7603b60d25250bb53a7ae9f147cd3cd) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java hadoop-common-project/hadoop-common/CHANGES.txt
        Hide
        ozawa Tsuyoshi Ozawa added a comment -

        Committed this to trunk and branch-2. Thanks Tianyin Xu for your contribution and thanks Stephen Chu for your comment.

        Show
        ozawa Tsuyoshi Ozawa added a comment - Committed this to trunk and branch-2. Thanks Tianyin Xu for your contribution and thanks Stephen Chu for your comment.

          People

          • Assignee:
            tianyin Tianyin Xu
            Reporter:
            tianyin Tianyin Xu
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development