Uploaded image for project: 'Atlas'
  1. Atlas
  2. ATLAS-2751

Atlas is not consuming messages from ATLAS_HOOK topic after recovering from zookeeper connection timeout.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.8.1, 1.0.0-alpha, 0.8.2, 1.0.0
    • 0.8.3, 1.1.0
    • atlas-core
    • None

    Description

      Atlas is not consuming messages from ATLAS_HOOK topic after recovering from zookeeper connection timeout.

      • Below snapshot showing, atlas going into passive state with notification consumer thread shutdown
        2018-05-21 18:45:42,156 INFO - [org.apache.ranger.audit.queue.AuditBatchQueue1:] ~ Audit Status Log: name=atlas.async.multi_dest.batch, finalDestination=atlas.async.multi_dest.batch.solr, interval=01:00.002 minutes, events=4, succcessCount=2, totalEvents=2960, totalSuccessCount=112 (BaseAuditHandler:312)
        2018-05-21 20:33:32,604 INFO - [main-SendThread(ctr-e138-1518143905142-323545-01-000005.hwx.site:2181):] ~ Client session timed out, have not heard from server in 13339ms for sessionid 0x1637d8f847f002c, closing socket connection and attempting reconnect (ClientCnxn:1140)
        2018-05-21 20:33:32,705 INFO - [main-EventThread:] ~ State change: SUSPENDED (ConnectionStateManager:228)
        2018-05-21 20:33:32,708 WARN - [Curator-ConnectionStateManager-0:] ~ Server instance with server id id1 is removed as leader (ActiveInstanceElectorService:191)
        2018-05-21 20:33:32,708 WARN - [Curator-ConnectionStateManager-0:] ~ Instance becoming passive from ACTIVE (ServiceState:92)
        2018-05-21 20:33:32,709 INFO - [Curator-ConnectionStateManager-0:] ~ Reacting to passive state: shutting down Kafka consumers. (NotificationHookConsumer:227)
        2018-05-21 20:33:32,709 INFO - [Curator-ConnectionStateManager-0:] ~ ==> stopConsumerThreads() (NotificationHookConsumer:193)
        2018-05-21 20:33:32,709 INFO - [Curator-ConnectionStateManager-0:] ~ ==> HookConsumer shutdown() (NotificationHookConsumer:610)
        2018-05-21 20:33:32,710 INFO - [Curator-ConnectionStateManager-0:] ~ [atlas-hook-consumer-thread]: Shutting down (NotificationHookConsumer$HookConsumer:72)
        2018-05-21 20:33:32,710 INFO - [NotificationHookConsumer thread-0:] ~ closing NotificationConsumer (NotificationHookConsumer:343)
        2018-05-21 20:33:34,550 INFO - [main-SendThread(ctr-e138-1518143905142-323545-01-000005.hwx.site:2181):] ~ Client will use GSSAPI as SASL mechanism. (ZooKeeperSaslClient:285)
        2018-05-21 20:33:34,551 INFO - [main-SendThread(ctr-e138-1518143905142-323545-01-000005.hwx.site:2181):] ~ Opening socket connection to server ctr-e138-1518143905142-323545-01-000005.hwx.site/172.27.15.200:2181. Will attempt to SASL-authenticate using Login Context section 'Client' (ClientCnxn:1019)
        2018-05-21 20:33:47,715 ERROR - [Curator-Framework-0:] ~ Connection timed out for connection string (ctr-e138-1518143905142-323545-01-000005.hwx.site:2181) and timeout (15000) / elapsed (15006) (ConnectionState:200)
        org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
         at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
         at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
         at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116)
         at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835)
         at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
         at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
         at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         at java.lang.Thread.run(Thread.java:748)
        
      • Below snapshot showing, atlas recovering from zookeeper timeout and assuming to active instance with consumer thread initialized to read kafka events.
        2018-05-21 20:34:13,358 INFO - [Curator-Framework-0-EventThread:] ~ State change: RECONNECTED (ConnectionStateManager:228)
        2018-05-21 20:34:13,366 WARN - [Curator-Framework-0-EventThread:] ~ Server instance with server id id1 is elected as leader (ActiveInstanceElectorService:145)
        2018-05-21 20:34:13,366 WARN - [Curator-Framework-0-EventThread:] ~ Instance becoming active from PASSIVE (ServiceState:76)
        2018-05-21 20:34:13,366 INFO - [Curator-Framework-0-EventThread:] ~ Reacting to active: Creating HBase table for Audit if required. (HBaseBasedAuditRepository:571)
        2018-05-21 20:34:13,366 INFO - [Curator-Framework-0-EventThread:] ~ Checking if table ATLAS_ENTITY_AUDIT_EVENTS exists (HBaseBasedAuditRepository:513)
        2018-05-21 20:34:13,379 INFO - [Curator-Framework-0-EventThread:] ~ Table ATLAS_ENTITY_AUDIT_EVENTS exists (HBaseBasedAuditRepository:525)
        2018-05-21 20:34:13,379 INFO - [Curator-Framework-0-EventThread:] ~ Reacting to active: initializing index (GraphBackedSearchIndexer:132)
        2018-05-21 20:34:13,380 INFO - [Curator-Framework-0-EventThread:] ~ Creating indexes for graph. (GraphBackedSearchIndexer:243)
        2018-05-21 20:34:13,429 INFO - [Curator-Framework-0-EventThread:] ~ Index creation for global keys complete. (GraphBackedSearchIndexer:293)
        2018-05-21 20:34:13,430 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer.instanceIsActive() (AtlasTypeDefStoreInitializer:331)
        2018-05-21 20:34:13,430 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefGraphStoreV1.init() (AtlasTypeDefGraphStoreV1:106)
        2018-05-21 20:34:13,430 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefGraphStore.init() (AtlasTypeDefGraphStore:81)
        2018-05-21 20:34:58,686 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined between table_type_mALvg and db_type_mALvg on attribute: table_type_mALvg.db (AtlasEntityType:145)
        2018-05-21 20:34:58,686 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined between table_type_mALvg and column_type_mALvg on attribute: table_type_mALvg.column (AtlasEntityType:145)
        2018-05-21 20:34:58,686 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined between composite_entity_dRRgt and simple_entity_dRRgt on attribute: composite_entity_dRRgt.simple_entity (AtlasEntityType:145)
        2018-05-21 20:34:58,687 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined between composite_entity_zbqFl and simple_entity_zbqFl on attribute: composite_entity_zbqFl.simple_entity (AtlasEntityType:145)
        2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined between __AtlasUserProfile and __AtlasUserSavedSearch on attribute: __AtlasUserProfile.savedSearches (AtlasEntityType:145)
        2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined between composite_entity_PEhru and simple_entity_PEhru on attribute: composite_entity_PEhru.simple_entity (AtlasEntityType:145)
        2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined between table_type_UUDrx and db_type_UUDrx on attribute: table_type_UUDrx.db (AtlasEntityType:145)
        2018-05-21 20:34:58,688 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined between table_type_UUDrx and column_type_UUDrx on attribute: table_type_UUDrx.column (AtlasEntityType:145)
        2018-05-21 20:34:58,689 WARN - [Curator-Framework-0-EventThread:] ~ No RelationshipDef defined between composite_entity_qhCgR and simple_entity_qhCgR on attribute: composite_entity_qhCgR.simple_entity (AtlasEntityType:145)
        2018-05-21 20:34:58,689 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefGraphStore.init() (AtlasTypeDefGraphStore:105)
        2018-05-21 20:34:58,689 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefGraphStoreV1.init() (AtlasTypeDefGraphStoreV1:110)
        2018-05-21 20:34:58,692 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer.loadBootstrapTypeDefs() (AtlasTypeDefStoreInitializer:116)
        2018-05-21 20:34:58,692 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0) (AtlasTypeDefStoreInitializer:153)
        2018-05-21 20:34:58,694 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0/0010-base_model.json (AtlasTypeDefStoreInitializer:185)
        2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0/0011-glossary_model.json (AtlasTypeDefStoreInitializer:185)
        2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ Type patches directory /usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0/patches does not exist or not readable or has no patches (AtlasTypeDefStoreInitializer:392)
        2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/0000-Area0) (AtlasTypeDefStoreInitializer:196)
        2018-05-21 20:34:58,697 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop) (AtlasTypeDefStoreInitializer:153)
        2018-05-21 20:34:58,698 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1020-fs_model.json (AtlasTypeDefStoreInitializer:185)
        2018-05-21 20:34:58,700 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1030-hive_model.json (AtlasTypeDefStoreInitializer:185)
        2018-05-21 20:34:58,700 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1040-sqoop_model.json (AtlasTypeDefStoreInitializer:185)
        2018-05-21 20:34:58,701 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1050-falcon_model.json (AtlasTypeDefStoreInitializer:185)
        2018-05-21 20:34:58,702 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1060-hbase_model.json (AtlasTypeDefStoreInitializer:185)
        2018-05-21 20:34:58,703 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1070-kafka_model.json (AtlasTypeDefStoreInitializer:185)
        2018-05-21 20:34:58,704 INFO - [Curator-Framework-0-EventThread:] ~ No new type in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/1080-storm_model.json (AtlasTypeDefStoreInitializer:185)
        .
        .
        .
        .
        .
        2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ patch skipped: typeName=hbase_table; applyToVersion=1.2; updateToVersion=1.3 (AtlasTypeDefStoreInitializer:618)
        2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ patch skipped: typeName=hbase_column_family; applyToVersion=1.1; updateToVersion=1.2 (AtlasTypeDefStoreInitializer:618)
        2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ Applying patches in file /usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop/patches/006-kafka_topic_add_attribute.json (AtlasTypeDefStoreInitializer:416)
        2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ patch skipped: typeName=kafka_topic; applyToVersion=1.0; updateToVersion=1.1 (AtlasTypeDefStoreInitializer:618)
        2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models/1000-Hadoop) (AtlasTypeDefStoreInitializer:196)
        2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ ==> AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models) (AtlasTypeDefStoreInitializer:153)
        2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ Type patches directory /usr/hdp/3.0.0.0-1365/atlas/models/patches does not exist or not readable or has no patches (AtlasTypeDefStoreInitializer:392)
        2018-05-21 20:34:58,708 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer(/usr/hdp/3.0.0.0-1365/atlas/models) (AtlasTypeDefStoreInitializer:196)
        2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer.loadBootstrapTypeDefs() (AtlasTypeDefStoreInitializer:145)
        2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ <== AtlasTypeDefStoreInitializer.instanceIsActive() (AtlasTypeDefStoreInitializer:341)
        2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ Reacting to active state: initializing Kafka consumers (NotificationHookConsumer:214)
        2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ ==> KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, autoCommitEnabled=false) (KafkaNotification:149)
        2018-05-21 20:34:58,709 INFO - [Curator-Framework-0-EventThread:] ~ <== KafkaNotification.createConsumers(notificationType=HOOK, numConsumers=1, autoCommitEnabled=false) (KafkaNotification:156)
        2018-05-21 20:34:58,710 INFO - [NotificationHookConsumer thread-0:] ~ [atlas-hook-consumer-thread]: Starting (NotificationHookConsumer$HookConsumer:72)
        2018-05-21 20:34:58,710 INFO - [NotificationHookConsumer thread-0:] ~ ==> HookConsumer doWork() (NotificationHookConsumer:313)
        2018-05-21 20:34:58,710 INFO - [NotificationHookConsumer thread-0:] ~ Atlas Server is not ready. Waiting for 1000 milliseconds to retry... (NotificationHookConsumer:588)
        2018-05-21 20:34:58,713 WARN - [Curator-Framework-0-EventThread:] ~ Instance is active from BECOMING_ACTIVE (ServiceState:87)
        2018-05-21 20:34:59,710 INFO - [NotificationHookConsumer thread-0:] ~ Atlas Server is ready, can start reading Kafka events. (NotificationHookConsumer:603)
        
      • Below snapshot confirms that, hive hook messages ingested into ATLAS_HOOk topic but not consumed by Atlas
        /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper ctr-e138-1518143905142-323545-01-000005.hwx.site:2181,ctr-e138-1518143905142-323545-01-000006.hwx.site:2181,ctr-e138-1518143905142-323545-01-000007.hwx.site:2181 --topic ATLAS_HOOK --security-protocol PLAINTEXTSASL
        Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
        
        
        {"version":\{"version":"1.0.0","versionParts":[1]},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1,"msgSourceIP":"172.27.27.0","msgCreatedBy":"hive","msgCreationTime":1527012364380,"message":\{"type":"ENTITY_CREATE_V2","user":"hrt_qa","entities":{"referredEntities":{},"entities":[\{"typeName":"hive_db","attributes":{"owner":"hrt_qa","ownerType":"USER","qualifiedName":"db10@cl1","clusterName":"cl1","name":"db10","description":null,"location":"hdfs://ns1/apps/hive/warehouse/db10.db","parameters":{}},"guid":"-4487807398104856","version":0}]}}}
        ^CProcessed a total of 1 messages
        

      Live cluster: https://172.27.17.64:8443/#/main/services/ATLAS/summary
      Active atlas: https://ctr-e138-1518143905142-323545-01-000003.hwx.site:21443/
      NOTE: Currently, above atlas service is in buggy state, restarting atlas will bring atlas to normal mode. Please do not restart atlas service.

      Attachments

        Activity

          People

            sarath Sarath Subramanian
            sarath Sarath Subramanian
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: