Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-3168

PatchFx: Support for HA Mode

    XMLWordPrintableJSON

Details

    Description

      Description

      PatchFx in HA mode causes exceptions.

      Steps to Duplicate

      Deploy latest version of Atlas on a cluster with HA deployment.

      Following error appears during startup:

      2019-04-23 03:54:22,280 ERROR - [main-EventThread:] ~ Got exception while activating (ActiveInstanceElectorService:160)
      java.lang.NullPointerException
              at org.apache.atlas.repository.audit.HBaseBasedAuditRepository.createTableIfNotExists(HBaseBasedAuditRepository.java:521)
              at org.apache.atlas.repository.audit.HBaseBasedAuditRepository.instanceIsActive(HBaseBasedAuditRepository.java:627)
              at org.apache.atlas.web.service.ActiveInstanceElectorService.isLeader(ActiveInstanceElectorService.java:154)
              at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:665)
              at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:661)
              at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
              at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435)
              at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
              at org.apache.curator.framework.recipes.leader.LeaderLatch.setLeadership(LeaderLatch.java:660)
              at org.apache.curator.framework.recipes.leader.LeaderLatch.checkLeadership(LeaderLatch.java:539)
              at org.apache.curator.framework.recipes.leader.LeaderLatch.access$700(LeaderLatch.java:65)
              at org.apache.curator.framework.recipes.leader.LeaderLatch$7.processResult(LeaderLatch.java:590)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:865)
              at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:635)
              at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
              at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:187)
              at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:602)
              at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
      2019-04-23 03:54:22,280 WARN  - [main-EventThread:] ~ Server instance with server id id2 is removed as leader (ActiveInstanceElectorService:197)
      

      Root Cause

      Pattern followed within Atlas:

      • Service.start is called when Services is initialized.
      • For every service:
        • Atlas is not in HA mode: Start and perform startup specific actions.
        • Atlas is in HA mode: Start and wait for instanceIsActive to be called.
      • AtlasPatchService did not implement ActiveStateChangeHandler.
      • AtlasPatchService was not registered with ActiveStateChangeHandler.HandlerOrder.

      This cause AtlasPatchService.start to perform its job of patching the database. This happened without AtlasTypeDefStoreInitializer initialized. This cause exceptions. ActiveInstanceElectoral service got callback from ZK asking it to call the instanceIsActive method on HBaseRepositoryService, which had not been started. This caused the exception to show the stack trace.

      Solution

      Modify AtlasPatchService to follow the pattern used for other services.

      Attachments

        Issue Links

          Activity

            People

              amestry Ashutosh Mestry
              amestry Ashutosh Mestry
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: