Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0, trunk
Description
Description
PatchFx in HA mode causes exceptions.
Steps to Duplicate
Deploy latest version of Atlas on a cluster with HA deployment.
Following error appears during startup:
2019-04-23 03:54:22,280 ERROR - [main-EventThread:] ~ Got exception while activating (ActiveInstanceElectorService:160)
java.lang.NullPointerException
at org.apache.atlas.repository.audit.HBaseBasedAuditRepository.createTableIfNotExists(HBaseBasedAuditRepository.java:521)
at org.apache.atlas.repository.audit.HBaseBasedAuditRepository.instanceIsActive(HBaseBasedAuditRepository.java:627)
at org.apache.atlas.web.service.ActiveInstanceElectorService.isLeader(ActiveInstanceElectorService.java:154)
at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:665)
at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:661)
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93)
at org.apache.curator.shaded.com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435)
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85)
at org.apache.curator.framework.recipes.leader.LeaderLatch.setLeadership(LeaderLatch.java:660)
at org.apache.curator.framework.recipes.leader.LeaderLatch.checkLeadership(LeaderLatch.java:539)
at org.apache.curator.framework.recipes.leader.LeaderLatch.access$700(LeaderLatch.java:65)
at org.apache.curator.framework.recipes.leader.LeaderLatch$7.processResult(LeaderLatch.java:590)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.sendToBackgroundCallback(CuratorFrameworkImpl.java:865)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:635)
at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$2.processResult(GetChildrenBuilderImpl.java:187)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:602)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2019-04-23 03:54:22,280 WARN - [main-EventThread:] ~ Server instance with server id id2 is removed as leader (ActiveInstanceElectorService:197)
Root Cause
Pattern followed within Atlas:
- Service.start is called when Services is initialized.
- For every service:
- Atlas is not in HA mode: Start and perform startup specific actions.
- Atlas is in HA mode: Start and wait for instanceIsActive to be called.
- AtlasPatchService did not implement ActiveStateChangeHandler.
- AtlasPatchService was not registered with ActiveStateChangeHandler.HandlerOrder.
This cause AtlasPatchService.start to perform its job of patching the database. This happened without AtlasTypeDefStoreInitializer initialized. This cause exceptions. ActiveInstanceElectoral service got callback from ZK asking it to call the instanceIsActive method on HBaseRepositoryService, which had not been started. This caused the exception to show the stack trace.
Solution
Modify AtlasPatchService to follow the pattern used for other services.
Attachments
Attachments
Issue Links
- links to