Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1586

TajoMaster HA startup failure on Yarn.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11.0, 0.10.1
    • Component/s: TajoMaster
    • Labels:
      None

      Description

      I tried to deploy Tajo on YARN with Slider. But I couldn't deploy Tajo because of TajoMaster HA failure. TajoWorker failed to load TajoMaster address as follows.

      2015-04-28 04:52:22,266 INFO org.apache.hadoop.service.AbstractService: Service org.apache.tajo.worker.TajoWorker failed in state STARTED; cause: org.apache.tajo.service.ServiceTrackerException: org.apache.tajo.service.ServiceTrackerException: No active master entry
      org.apache.tajo.service.ServiceTrackerException: org.apache.tajo.service.ServiceTrackerException: No active master entry
      	at org.apache.tajo.ha.HdfsServiceTracker.getAddressElements(HdfsServiceTracker.java:441)
      	at org.apache.tajo.ha.HdfsServiceTracker.getUmbilicalAddress(HdfsServiceTracker.java:348)
      	at org.apache.tajo.worker.TajoWorker.serviceStart(TajoWorker.java:318)
      	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
      	at org.apache.tajo.worker.TajoWorker.startWorker(TajoWorker.java:141)
      	at org.apache.tajo.worker.TajoWorker.main(TajoWorker.java:627)
      Caused by: org.apache.tajo.service.ServiceTrackerException: No active master entry
      	at org.apache.tajo.ha.HdfsServiceTracker.getAddressElements(HdfsServiceTracker.java:413)
      	... 5 more
      2015-04-28 04:52:22,307 INFO org.apache.hadoop.service.AbstractService: Service WorkerHeartbeatService failed in state STOPPED; cause: java.lang.NullPointerException
      java.lang.NullPointerException
      	at org.apache.tajo.worker.WorkerHeartbeatService$WorkerHeartbeatThread.access$000(WorkerHeartbeatService.java:101)
      	at org.apache.tajo.worker.WorkerHeartbeatService.serviceStop(WorkerHeartbeatService.java:90)
      	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
      	at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
      	at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
      	at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
      	at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
      	at org.apache.tajo.worker.TajoWorker.serviceStop(TajoWorker.java:375)
      	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
      	at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
      	at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
      	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
      	at org.apache.tajo.worker.TajoWorker.startWorker(TajoWorker.java:141)
      	at org.apache.tajo.worker.TajoWorker.main(TajoWorker.java:627)

      I think that the cause of this failure is time difference between TajoMaster and TajoWorker.

      1. TAJO-1586_3.patch
        58 kB
        Jaehwa Jung
      2. TAJO-1586_2.patch
        58 kB
        Jaehwa Jung
      3. TAJO-1586.patch
        15 kB
        Jaehwa Jung

        Activity

        Hide
        blrunner Jaehwa Jung added a comment -

        Jinho Kim

        Thanks for your review.
        I've committed this to the master branch and the 0.10.1 branch.

        Show
        blrunner Jaehwa Jung added a comment - Jinho Kim Thanks for your review. I've committed this to the master branch and the 0.10.1 branch.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #344 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/344/)
        TAJO-1586: TajoMaster HA startup failure on Yarn. (jaehwa) (blrunner: rev 31c4630d5d3ce0dee1df35491c557eefad15deeb)

        • tajo-common/src/main/java/org/apache/tajo/ha/HAServiceUtil.java
        • tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java
        • tajo-core/src/main/resources/webapps/admin/catalogview.jsp
        • CHANGES
        • tajo-common/src/main/java/org/apache/tajo/service/ServiceTracker.java
        • tajo-core/src/main/resources/webapps/admin/query.jsp
        • tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java
        • tajo-core/src/main/resources/webapps/admin/index.jsp
        • tajo-cli/src/main/java/org/apache/tajo/cli/tools/TajoAdmin.java
        • tajo-common/src/main/java/org/apache/tajo/service/BaseServiceTracker.java
        • tajo-cli/src/main/java/org/apache/tajo/cli/tools/TajoHAAdmin.java
        • tajo-client/src/main/java/org/apache/tajo/client/DummyServiceTracker.java
        • tajo-common/src/main/java/org/apache/tajo/ha/HAConstants.java
        • tajo-core/src/main/resources/webapps/admin/query_executor.jsp
        • tajo-common/src/main/java/org/apache/tajo/service/HAServiceTracker.java
        • tajo-core/src/main/java/org/apache/tajo/util/JSPUtil.java
        • tajo-core/src/main/java/org/apache/tajo/master/TajoMaster.java
        • tajo-core/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java
        • tajo-core/src/main/resources/webapps/admin/cluster.jsp
        • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #344 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/344/ ) TAJO-1586 : TajoMaster HA startup failure on Yarn. (jaehwa) (blrunner: rev 31c4630d5d3ce0dee1df35491c557eefad15deeb) tajo-common/src/main/java/org/apache/tajo/ha/HAServiceUtil.java tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java tajo-core/src/main/resources/webapps/admin/catalogview.jsp CHANGES tajo-common/src/main/java/org/apache/tajo/service/ServiceTracker.java tajo-core/src/main/resources/webapps/admin/query.jsp tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java tajo-core/src/main/resources/webapps/admin/index.jsp tajo-cli/src/main/java/org/apache/tajo/cli/tools/TajoAdmin.java tajo-common/src/main/java/org/apache/tajo/service/BaseServiceTracker.java tajo-cli/src/main/java/org/apache/tajo/cli/tools/TajoHAAdmin.java tajo-client/src/main/java/org/apache/tajo/client/DummyServiceTracker.java tajo-common/src/main/java/org/apache/tajo/ha/HAConstants.java tajo-core/src/main/resources/webapps/admin/query_executor.jsp tajo-common/src/main/java/org/apache/tajo/service/HAServiceTracker.java tajo-core/src/main/java/org/apache/tajo/util/JSPUtil.java tajo-core/src/main/java/org/apache/tajo/master/TajoMaster.java tajo-core/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java tajo-core/src/main/resources/webapps/admin/cluster.jsp tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-build #704 (See https://builds.apache.org/job/Tajo-master-build/704/)
        TAJO-1586: TajoMaster HA startup failure on Yarn. (jaehwa) (blrunner: rev 31c4630d5d3ce0dee1df35491c557eefad15deeb)

        • tajo-core/src/main/resources/webapps/admin/query.jsp
        • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
        • tajo-common/src/main/java/org/apache/tajo/service/ServiceTracker.java
        • tajo-core/src/main/resources/webapps/admin/cluster.jsp
        • tajo-common/src/main/java/org/apache/tajo/ha/HAServiceUtil.java
        • tajo-core/src/main/java/org/apache/tajo/util/JSPUtil.java
        • tajo-common/src/main/java/org/apache/tajo/service/BaseServiceTracker.java
        • tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java
        • tajo-core/src/main/java/org/apache/tajo/master/TajoMaster.java
        • CHANGES
        • tajo-core/src/main/resources/webapps/admin/query_executor.jsp
        • tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java
        • tajo-core/src/main/resources/webapps/admin/catalogview.jsp
        • tajo-core/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java
        • tajo-common/src/main/java/org/apache/tajo/service/HAServiceTracker.java
        • tajo-common/src/main/java/org/apache/tajo/ha/HAConstants.java
        • tajo-core/src/main/resources/webapps/admin/index.jsp
        • tajo-cli/src/main/java/org/apache/tajo/cli/tools/TajoAdmin.java
        • tajo-cli/src/main/java/org/apache/tajo/cli/tools/TajoHAAdmin.java
        • tajo-client/src/main/java/org/apache/tajo/client/DummyServiceTracker.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-build #704 (See https://builds.apache.org/job/Tajo-master-build/704/ ) TAJO-1586 : TajoMaster HA startup failure on Yarn. (jaehwa) (blrunner: rev 31c4630d5d3ce0dee1df35491c557eefad15deeb) tajo-core/src/main/resources/webapps/admin/query.jsp tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-common/src/main/java/org/apache/tajo/service/ServiceTracker.java tajo-core/src/main/resources/webapps/admin/cluster.jsp tajo-common/src/main/java/org/apache/tajo/ha/HAServiceUtil.java tajo-core/src/main/java/org/apache/tajo/util/JSPUtil.java tajo-common/src/main/java/org/apache/tajo/service/BaseServiceTracker.java tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java tajo-core/src/main/java/org/apache/tajo/master/TajoMaster.java CHANGES tajo-core/src/main/resources/webapps/admin/query_executor.jsp tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java tajo-core/src/main/resources/webapps/admin/catalogview.jsp tajo-core/src/test/java/org/apache/tajo/ha/TestHAServiceHDFSImpl.java tajo-common/src/main/java/org/apache/tajo/service/HAServiceTracker.java tajo-common/src/main/java/org/apache/tajo/ha/HAConstants.java tajo-core/src/main/resources/webapps/admin/index.jsp tajo-cli/src/main/java/org/apache/tajo/cli/tools/TajoAdmin.java tajo-cli/src/main/java/org/apache/tajo/cli/tools/TajoHAAdmin.java tajo-client/src/main/java/org/apache/tajo/client/DummyServiceTracker.java
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/566

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/566
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/566#issuecomment-101911095

        +1 LGTM!!

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/566#issuecomment-101911095 +1 LGTM!!
        Hide
        blrunner Jaehwa Jung added a comment -

        Removed unnecessary codes.

        Show
        blrunner Jaehwa Jung added a comment - Removed unnecessary codes.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/566#issuecomment-101888501

        @jinossy

        I fixed other bugs. Could you check this again?

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/566#issuecomment-101888501 @jinossy I fixed other bugs. Could you check this again?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/566#issuecomment-101847146

        @jinossy

        Thanks for your detailed review.
        I've just updated this patch just a while ago.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/566#issuecomment-101847146 @jinossy Thanks for your detailed review. I've just updated this patch just a while ago.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/566#discussion_r30236952

        — Diff: tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java —
        @@ -322,9 +323,33 @@ public void serviceStart() throws Exception {
        startJvmPauseMonitor();

        tajoMasterInfo = new TajoMasterInfo();
        +
        + // Most of the time, TajoWorker will start before TajoMaster. After TajoMaster non-HA, it doesn't matter.
        + // But in HA, TajoWorker will fail to start because TajoWorker couldn't find TajoMaster address on shared storage.
        + // Thus, TajoWorker need to try to find the address for a certain period of time.
        if (systemConf.getBoolVar(TajoConf.ConfVars.TAJO_MASTER_HA_ENABLE)) {

        • tajoMasterInfo.setTajoMasterAddress(serviceTracker.getUmbilicalAddress());
        • tajoMasterInfo.setWorkerResourceTrackerAddr(serviceTracker.getResourceTrackerAddress());
          + long retryWaitTime = systemConf.getLongVar(TajoConf.ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_WAIT_TIME);
          + int retryMaxNum = systemConf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_MAX_NUM);
          + int retryNum = 1;
          +
          + boolean done = false;
          +
          + while (!done && retryNum < retryMaxNum) {
          + try { + tajoMasterInfo.setTajoMasterAddress(serviceTracker.getUmbilicalAddress()); + tajoMasterInfo.setWorkerResourceTrackerAddr(serviceTracker.getResourceTrackerAddress()); + done = true; + LOG.info("Find a new TajoMaster (" + tajoMasterInfo.getTajoMasterAddress() + ")"); + }

          catch (ServiceTrackerException e)

          { + LOG.warn("Retry TajoMaster address (" + retryNum + ")"); + Thread.sleep(retryWaitTime); + }

          + retryNum++;
          + if (retryNum == retryMaxNum)

          { + LOG.error("ERROR: the maximum retry (" + retryNum + ") to read TajoMaster address"); + break; + }
            • End diff –

        Can you move the retry codes to HdfsServiceTracker ? because backup master is registered in PingChecker

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/566#discussion_r30236952 — Diff: tajo-core/src/main/java/org/apache/tajo/worker/TajoWorker.java — @@ -322,9 +323,33 @@ public void serviceStart() throws Exception { startJvmPauseMonitor(); tajoMasterInfo = new TajoMasterInfo(); + + // Most of the time, TajoWorker will start before TajoMaster. After TajoMaster non-HA, it doesn't matter. + // But in HA, TajoWorker will fail to start because TajoWorker couldn't find TajoMaster address on shared storage. + // Thus, TajoWorker need to try to find the address for a certain period of time. if (systemConf.getBoolVar(TajoConf.ConfVars.TAJO_MASTER_HA_ENABLE)) { tajoMasterInfo.setTajoMasterAddress(serviceTracker.getUmbilicalAddress()); tajoMasterInfo.setWorkerResourceTrackerAddr(serviceTracker.getResourceTrackerAddress()); + long retryWaitTime = systemConf.getLongVar(TajoConf.ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_WAIT_TIME); + int retryMaxNum = systemConf.getIntVar(ConfVars.TAJO_MASTER_HA_CLIENT_RETRY_MAX_NUM); + int retryNum = 1; + + boolean done = false; + + while (!done && retryNum < retryMaxNum) { + try { + tajoMasterInfo.setTajoMasterAddress(serviceTracker.getUmbilicalAddress()); + tajoMasterInfo.setWorkerResourceTrackerAddr(serviceTracker.getResourceTrackerAddress()); + done = true; + LOG.info("Find a new TajoMaster (" + tajoMasterInfo.getTajoMasterAddress() + ")"); + } catch (ServiceTrackerException e) { + LOG.warn("Retry TajoMaster address (" + retryNum + ")"); + Thread.sleep(retryWaitTime); + } + retryNum++; + if (retryNum == retryMaxNum) { + LOG.error("ERROR: the maximum retry (" + retryNum + ") to read TajoMaster address"); + break; + } End diff – Can you move the retry codes to HdfsServiceTracker ? because backup master is registered in PingChecker
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/566#issuecomment-101636514

        Looks good to me. I left trivial comments.
        Thank you for your contribution

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/566#issuecomment-101636514 Looks good to me. I left trivial comments. Thank you for your contribution
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/566#discussion_r30222455

        — Diff: tajo-client/src/main/java/org/apache/tajo/client/DummyServiceTracker.java —
        @@ -65,6 +66,21 @@ public InetSocketAddress getMasterHttpInfo() throws ServiceTrackerException {
        }

        @Override
        + public int getState(String masterName, TajoConf conf) throws ServiceTrackerException

        { + return 0; + }
        +
        + @Override
        + public int formatHA(TajoConf conf) throws ServiceTrackerException { + return 0; + }

        +
        + @Override
        + public List<String> getMasters(TajoConf conf) throws ServiceTrackerException {
        + return null;
        — End diff –

        Can you change to empty list ?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/566#discussion_r30222455 — Diff: tajo-client/src/main/java/org/apache/tajo/client/DummyServiceTracker.java — @@ -65,6 +66,21 @@ public InetSocketAddress getMasterHttpInfo() throws ServiceTrackerException { } @Override + public int getState(String masterName, TajoConf conf) throws ServiceTrackerException { + return 0; + } + + @Override + public int formatHA(TajoConf conf) throws ServiceTrackerException { + return 0; + } + + @Override + public List<String> getMasters(TajoConf conf) throws ServiceTrackerException { + return null; — End diff – Can you change to empty list ?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/566#discussion_r30221996

        — Diff: tajo-common/src/main/java/org/apache/tajo/service/ServiceTracker.java —
        @@ -36,9 +38,15 @@

        public abstract InetSocketAddress getMasterHttpInfo() throws ServiceTrackerException;

        + public abstract int getState(String masterName, TajoConf conf) throws ServiceTrackerException;
        +
        + public abstract int formatHA(TajoConf conf) throws ServiceTrackerException;
        +
        + public abstract List<String> getMasters(TajoConf conf) throws ServiceTrackerException;
        +
        — End diff –

        ServiceTracker is an interface. Can you remove ‘public”, ”abstract’ keywords ?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/566#discussion_r30221996 — Diff: tajo-common/src/main/java/org/apache/tajo/service/ServiceTracker.java — @@ -36,9 +38,15 @@ public abstract InetSocketAddress getMasterHttpInfo() throws ServiceTrackerException; + public abstract int getState(String masterName, TajoConf conf) throws ServiceTrackerException; + + public abstract int formatHA(TajoConf conf) throws ServiceTrackerException; + + public abstract List<String> getMasters(TajoConf conf) throws ServiceTrackerException; + — End diff – ServiceTracker is an interface. Can you remove ‘public”, ”abstract’ keywords ?
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user blrunner opened a pull request:

        https://github.com/apache/tajo/pull/566

        TAJO-1586: TajoMaster HA startup failure on Yarn.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/blrunner/tajo TAJO-1586

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/566.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #566


        commit 40fdee40d812369a59b9c6f48e9ebd4c2de2eccd
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-05-11T08:14:15Z

        Rename active master file name

        commit 34ece4ce7f560257ffcb53b36d1273fbb9b88342
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-05-12T14:18:50Z

        Add active lock file.

        commit 8a6ca1b7fb51caa13e418618a3896cc83318ca02
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-05-12T14:19:10Z

        Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

        commit 2c498a1dae4a53e365af56110bd9c43c94e37225
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-05-13T06:22:59Z

        Refacor codes for TajoMaster HA

        commit 0c47c86117730966c6e3c3d479b715840c039cc7
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-05-13T06:23:22Z

        Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user blrunner opened a pull request: https://github.com/apache/tajo/pull/566 TAJO-1586 : TajoMaster HA startup failure on Yarn. You can merge this pull request into a Git repository by running: $ git pull https://github.com/blrunner/tajo TAJO-1586 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/566.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #566 commit 40fdee40d812369a59b9c6f48e9ebd4c2de2eccd Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-05-11T08:14:15Z Rename active master file name commit 34ece4ce7f560257ffcb53b36d1273fbb9b88342 Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-05-12T14:18:50Z Add active lock file. commit 8a6ca1b7fb51caa13e418618a3896cc83318ca02 Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-05-12T14:19:10Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo commit 2c498a1dae4a53e365af56110bd9c43c94e37225 Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-05-13T06:22:59Z Refacor codes for TajoMaster HA commit 0c47c86117730966c6e3c3d479b715840c039cc7 Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-05-13T06:23:22Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner closed the pull request at:

        https://github.com/apache/tajo/pull/560

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner closed the pull request at: https://github.com/apache/tajo/pull/560
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/560#discussion_r29731769

        — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java —
        @@ -184,15 +181,31 @@ private void createMasterFile(boolean isActive) throws IOException

        { createMasterFile(false); }

        + // Set active status and delete existing file.
        if (isActive) {
        isActiveStatus = true;
        + Path backupFile = new Path(backupPath, fileName);
        + if (fs.exists(backupFile))

        { + fs.delete(backupFile, true); + }

        } else {
        isActiveStatus = false;
        + Path activeFile = new Path(activePath, fileName);
        — End diff –

        Can you refactor the duplicate codes?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/560#discussion_r29731769 — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java — @@ -184,15 +181,31 @@ private void createMasterFile(boolean isActive) throws IOException { createMasterFile(false); } + // Set active status and delete existing file. if (isActive) { isActiveStatus = true; + Path backupFile = new Path(backupPath, fileName); + if (fs.exists(backupFile)) { + fs.delete(backupFile, true); + } } else { isActiveStatus = false; + Path activeFile = new Path(activePath, fileName); — End diff – Can you refactor the duplicate codes?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/560#discussion_r29731683

        — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java —
        @@ -142,6 +142,8 @@ public static int setDateOrder(int dateOrder) {
        // High availability configurations
        TAJO_MASTER_HA_ENABLE("tajo.master.ha.enable", false, Validators.bool()),
        TAJO_MASTER_HA_MONITOR_INTERVAL("tajo.master.ha.monitor.interval", 5 * 1000), // 5 sec
        + TAJO_MASTER_HA_CLIENT_RETRY_MAX_NUM("tajo.master.ha.client.read.retry.max-num", 20),
        + TAJO_MASTER_HA_CLIENT_RETRY_WAIT_TIME("tajo.master.ha.client.read.wait-time", 5 * 1000), // 5 sec
        — End diff –

        IMO, 5 sec is too long. What do you think to change (500 ms * 120 retry) ?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/560#discussion_r29731683 — Diff: tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java — @@ -142,6 +142,8 @@ public static int setDateOrder(int dateOrder) { // High availability configurations TAJO_MASTER_HA_ENABLE("tajo.master.ha.enable", false, Validators.bool()), TAJO_MASTER_HA_MONITOR_INTERVAL("tajo.master.ha.monitor.interval", 5 * 1000), // 5 sec + TAJO_MASTER_HA_CLIENT_RETRY_MAX_NUM("tajo.master.ha.client.read.retry.max-num", 20), + TAJO_MASTER_HA_CLIENT_RETRY_WAIT_TIME("tajo.master.ha.client.read.wait-time", 5 * 1000), // 5 sec — End diff – IMO, 5 sec is too long. What do you think to change (500 ms * 120 retry) ?
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12729448/TAJO-1586.patch
        against master revision release-0.9.0-rc0-280-g4755410.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in tajo-common tajo-core.

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/769//testReport/
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/769//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12729448/TAJO-1586.patch against master revision release-0.9.0-rc0-280-g4755410. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-common tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/769//testReport/ Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/769//console This message is automatically generated.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/560#issuecomment-97720036

        Hi @jinossy

        I updated the patch. Could you check it again?

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/560#issuecomment-97720036 Hi @jinossy I updated the patch. Could you check it again?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/560#discussion_r29412956

        — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java —
        @@ -70,12 +70,45 @@

        public HdfsServiceTracker(TajoConf conf) throws IOException

        { this.conf = conf; + }

        +
        + @Override
        + public void register() throws IOException {
        initSystemDirectory();

        InetSocketAddress socketAddress = conf.getSocketAddrVar(ConfVars.TAJO_MASTER_UMBILICAL_RPC_ADDRESS);

        • this.masterName = socketAddress.getAddress().getHostAddress() + ":" + socketAddress.getPort();
          + this.masterName = socketAddress.getAddress().getHostName()+ ":" + socketAddress.getPort();
            • End diff –

        Thanks for your review.
        I also think that ip address would be more safety compare than host name.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on a diff in the pull request: https://github.com/apache/tajo/pull/560#discussion_r29412956 — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java — @@ -70,12 +70,45 @@ public HdfsServiceTracker(TajoConf conf) throws IOException { this.conf = conf; + } + + @Override + public void register() throws IOException { initSystemDirectory(); InetSocketAddress socketAddress = conf.getSocketAddrVar(ConfVars.TAJO_MASTER_UMBILICAL_RPC_ADDRESS); this.masterName = socketAddress.getAddress().getHostAddress() + ":" + socketAddress.getPort(); + this.masterName = socketAddress.getAddress().getHostName()+ ":" + socketAddress.getPort(); End diff – Thanks for your review. I also think that ip address would be more safety compare than host name.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/560#discussion_r29403727

        — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java —
        @@ -70,12 +70,45 @@

        public HdfsServiceTracker(TajoConf conf) throws IOException

        { this.conf = conf; + }

        +
        + @Override
        + public void register() throws IOException {
        initSystemDirectory();

        InetSocketAddress socketAddress = conf.getSocketAddrVar(ConfVars.TAJO_MASTER_UMBILICAL_RPC_ADDRESS);

        • this.masterName = socketAddress.getAddress().getHostAddress() + ":" + socketAddress.getPort();
          + this.masterName = socketAddress.getAddress().getHostName()+ ":" + socketAddress.getPort();
            • End diff –

        If a hostname is not FQDN, it may be throw the unknown host exception.
        In my opinion, ip address is more safety. How do you think?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on a diff in the pull request: https://github.com/apache/tajo/pull/560#discussion_r29403727 — Diff: tajo-core/src/main/java/org/apache/tajo/ha/HdfsServiceTracker.java — @@ -70,12 +70,45 @@ public HdfsServiceTracker(TajoConf conf) throws IOException { this.conf = conf; + } + + @Override + public void register() throws IOException { initSystemDirectory(); InetSocketAddress socketAddress = conf.getSocketAddrVar(ConfVars.TAJO_MASTER_UMBILICAL_RPC_ADDRESS); this.masterName = socketAddress.getAddress().getHostAddress() + ":" + socketAddress.getPort(); + this.masterName = socketAddress.getAddress().getHostName()+ ":" + socketAddress.getPort(); End diff – If a hostname is not FQDN, it may be throw the unknown host exception. In my opinion, ip address is more safety. How do you think?
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user blrunner opened a pull request:

        https://github.com/apache/tajo/pull/560

        TAJO-1586: TajoMaster HA startup failure on Yarn.

        I tested this successfully on production cluster and I fixed various bugs related TajoMaster HA. For the reference, I replaced master IP address to master host name on Tajo web interface.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/blrunner/tajo TAJO-1586

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/560.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #560


        commit 58c144205637f7acc9145f8bcbdee0c5ca932cad
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-04-28T16:02:54Z

        TAJO-1586: TajoMaster HA startup failure on Yarn.

        commit 9c84e00480e03202b164925f3c363b305509a277
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-04-30T03:39:43Z

        Fix various bugs


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user blrunner opened a pull request: https://github.com/apache/tajo/pull/560 TAJO-1586 : TajoMaster HA startup failure on Yarn. I tested this successfully on production cluster and I fixed various bugs related TajoMaster HA. For the reference, I replaced master IP address to master host name on Tajo web interface. You can merge this pull request into a Git repository by running: $ git pull https://github.com/blrunner/tajo TAJO-1586 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/560.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #560 commit 58c144205637f7acc9145f8bcbdee0c5ca932cad Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-04-28T16:02:54Z TAJO-1586 : TajoMaster HA startup failure on Yarn. commit 9c84e00480e03202b164925f3c363b305509a277 Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-04-30T03:39:43Z Fix various bugs

          People

          • Assignee:
            blrunner Jaehwa Jung
            Reporter:
            blrunner Jaehwa Jung
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development