Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
Description
I can post logs externally; for now app IDs on test cluster are application_1429683757595_0784 and application_1429683757595_0783, I also have logs copied over.
AM found the node (same logs for other nodes):
2015-05-07 12:13:28,074 INFO [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] impl.LlapYarnRegistryImpl: Adding new worker 342f4992-2608-43ab-a119-b50882e35f75 which mapped to DynamicServiceInstance [alive=true, host=cn059-10.l42scl.hortonworks.com:15001 with resources=<memory:20480, vCores:6>] .... 2015-05-07 12:13:28,082 INFO [Dispatcher thread: Central] node.AMNodeTracker: Num cluster nodes = 19
Trouble is, this node never actually existed... The cluster only had 15 nodes.
As the job was progressing, AM repeatedly tried to schedule to this node and failed. There was no other LLAP cluster running at the same time.
In fact, given that I always start a 15-node cluster I am not sure where 19-node data could conceivably come from...
Attachments
Issue Links
- is related to
-
YARN-3371 TTL for YARN Registry SRV records
- Open
- is superceded by
-
HIVE-12935 LLAP: Replace Yarn registry with Zookeeper registry
- Closed