Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.0.0
-
None
Description
When restarting an Ambari Agent, child processes seem to hold onto the ping port server socket that the parent agent process listens on:
hdp2-02-02: ERROR: ambari-agent start failed. For more details, see /var/log/ambari-agent/ambari-agent.out: hdp2-02-02: ==================== hdp2-02-02: UID PID PPID C STIME TTY TIME CMD hdp2-02-02: root 23667 23663 0 09:40 ? 00:00:00 /usr/bin/sudo su ambari-qa -l -s /bin/bash -c export PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/amb ari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/ root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/l ib/hive/bin/:/usr/sbin/' ; hive --hiveconf hive.metastore.uris=thrift://hdp2-02-02.kane.homelinux.net:9083 -e 'show databases;' hdp2-02-02: Exception in thread Thread-1 (most likely raised during interpreter shutdown): hdp2-02-02: Traceback (most recent call last): hdp2-02-02: File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner hdp2-02-02: File "/usr/lib/python2.6/site-packages/ambari_agent/DataCleaner.py", line 119, in run hdp2-02-02: File "/usr/lib64/python2.6/logging/__init__.py", line 1056, in info hdp2-02-02: File "/usr/lib64/python2.6/logging/__init__.py", line 1164, in _log hdp2-02-02: File "/usr/lib64/python2.6/logging/__init__.py", line 1134, in findCaller hdp2-02-02: <type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'path' hdp2-02-02: ==================== hdp2-02-02: Agent out at: /var/log/ambari-agent/ambari-agent.out hdp2-02-02: Agent log at: /var/log/ambari-agent/ambari-agent.log hdp2-02-02: ambari-server: unrecognized service Here is the tail of the ambari-agent log on that server: NFO 2015-03-11 09:40:13,518 logger.py:65 - u"Execute['hive --hiveconf hive.metastore.uris=thrift://hdp2-02-02:9083 -e 'show databases;'']" {'path': ['/bin/ ', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa', 'timeout': 240} INFO 2015-03-11 09:40:13,756 scheduler.py:527 - Job "ec115aa5-8e09-454c-a4db-3f7d8ee47d84 (trigger: interval[0:01:00], next run at: 2015-03-11 09:41:12.764254)" executed succe ssfully INFO 2015-03-11 09:40:19,090 Heartbeat.py:75 - Building Heartbeat: {responseId = 3286, timestamp = 1426081219090, commandsInProgress = False, componentsMapped = True} INFO 2015-03-11 09:40:19,102 Controller.py:247 - Heartbeat response received (id = 3287) INFO 2015-03-11 09:40:19,102 Controller.py:291 - No commands sent from hdp2-02-01.kane.homelinux.net INFO 2015-03-11 09:40:26,771 main.py:68 - loglevel=logging.INFO INFO 2015-03-11 09:40:29,103 Heartbeat.py:75 - Building Heartbeat: {responseId = 3287, timestamp = 1426081229103, commandsInProgress = False, componentsMapped = True} INFO 2015-03-11 09:40:29,104 security.py:135 - Encountered communication error. Details: BadStatusLine('',) ERROR 2015-03-11 09:40:29,104 Controller.py:319 - Connection to hdp2-02-01.kane.homelinux.net was lost (details=Request to https://hdp2-02-01.kane.homelinux.net:8441/agent/v1/ heartbeat/hdp2-02-02.kane.homelinux.net failed due to Error occured during connecting to the server: ) INFO 2015-03-11 09:40:33,312 main.py:68 - loglevel=logging.INFO INFO 2015-03-11 09:40:33,313 DataCleaner.py:36 - Data cleanup thread started INFO 2015-03-11 09:40:33,323 DataCleaner.py:117 - Data cleanup started ERROR 2015-03-11 09:40:33,433 main.py:243 - Failed to start ping port listener of: Could not open port 8670 because port already used by another process: UID PID PPID C STIME TTY TIME CMD root 23667 23663 0 09:40 ? 00:00:00 /usr/bin/sudo su ambari-qa -l -s /bin/bash -c export PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/a mbari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin :/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr /lib/hive/bin/:/usr/sbin/ ' ; hive --hiveconf hive.metastore.uris=thrift://hdp2-02-02:9083 -e 'show databases;' INFO 2015-03-11 09:40:33,433 PingPortListener.py:62 - Ping port listener killed
Attachments
Attachments
Issue Links
- links to