Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-10083

Ambari Agent Alerts Prevents Binding to the Ping Port Listener On Startup

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • ambari-agent
    • None

    Description

      When restarting an Ambari Agent, child processes seem to hold onto the ping port server socket that the parent agent process listens on:

      hdp2-02-02: ERROR: ambari-agent start failed. For more details, see
      /var/log/ambari-agent/ambari-agent.out:
      hdp2-02-02: ====================
      hdp2-02-02: UID        PID  PPID  C STIME TTY          TIME CMD
      hdp2-02-02: root     23667 23663  0 09:40 ?        00:00:00 /usr/bin/sudo
      su ambari-qa -l -s /bin/bash -c export
      PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/amb
      ari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/
      root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/l
      ib/hive/bin/:/usr/sbin/' ; hive --hiveconf
      hive.metastore.uris=thrift://hdp2-02-02.kane.homelinux.net:9083 -e 'show
      databases;'
      hdp2-02-02: Exception in thread Thread-1 (most likely raised during
      interpreter shutdown):
      hdp2-02-02: Traceback (most recent call last):
      hdp2-02-02:   File "/usr/lib64/python2.6/threading.py", line 532, in
      __bootstrap_inner
      hdp2-02-02:   File
      "/usr/lib/python2.6/site-packages/ambari_agent/DataCleaner.py", line 119,
      in run
      hdp2-02-02:   File "/usr/lib64/python2.6/logging/__init__.py", line 1056,
      in info
      hdp2-02-02:   File "/usr/lib64/python2.6/logging/__init__.py", line 1164,
      in _log
      hdp2-02-02:   File "/usr/lib64/python2.6/logging/__init__.py", line 1134,
      in findCaller
      hdp2-02-02: <type 'exceptions.AttributeError'>: 'NoneType' object has no
      attribute 'path'
      hdp2-02-02: ====================
      hdp2-02-02: Agent out at: /var/log/ambari-agent/ambari-agent.out
      hdp2-02-02: Agent log at: /var/log/ambari-agent/ambari-agent.log
      hdp2-02-02: ambari-server: unrecognized service
      
      
      Here is the tail of the ambari-agent log on that server:
      
      NFO 2015-03-11 09:40:13,518 logger.py:65 - u"Execute['hive --hiveconf
      hive.metastore.uris=thrift://hdp2-02-02:9083 -e 'show
      databases;'']" {'path': ['/bin/
      ', '/usr/bin/', '/usr/lib/hive/bin/', '/usr/sbin/'], 'user': 'ambari-qa',
      'timeout': 240}
      INFO 2015-03-11 09:40:13,756 scheduler.py:527 - Job
      "ec115aa5-8e09-454c-a4db-3f7d8ee47d84 (trigger: interval[0:01:00], next
      run at: 2015-03-11 09:41:12.764254)" executed succe
      ssfully
      INFO 2015-03-11 09:40:19,090 Heartbeat.py:75 - Building Heartbeat:
      {responseId = 3286, timestamp = 1426081219090, commandsInProgress = False,
      componentsMapped = True}
      INFO 2015-03-11 09:40:19,102 Controller.py:247 - Heartbeat response
      received (id = 3287)
      INFO 2015-03-11 09:40:19,102 Controller.py:291 - No commands sent from
      hdp2-02-01.kane.homelinux.net
      INFO 2015-03-11 09:40:26,771 main.py:68 - loglevel=logging.INFO
      INFO 2015-03-11 09:40:29,103 Heartbeat.py:75 - Building Heartbeat:
      {responseId = 3287, timestamp = 1426081229103, commandsInProgress = False,
      componentsMapped = True}
      INFO 2015-03-11 09:40:29,104 security.py:135 - Encountered communication
      error. Details: BadStatusLine('',)
      ERROR 2015-03-11 09:40:29,104 Controller.py:319 - Connection to
      hdp2-02-01.kane.homelinux.net was lost (details=Request to
      https://hdp2-02-01.kane.homelinux.net:8441/agent/v1/
      heartbeat/hdp2-02-02.kane.homelinux.net failed due to Error occured during
      connecting to the server: )
      INFO 2015-03-11 09:40:33,312 main.py:68 - loglevel=logging.INFO
      INFO 2015-03-11 09:40:33,313 DataCleaner.py:36 - Data cleanup thread
      started
      INFO 2015-03-11 09:40:33,323 DataCleaner.py:117 - Data cleanup started
      ERROR 2015-03-11 09:40:33,433 main.py:243 - Failed to start ping port
      listener of: Could not open port 8670 because port already used by another
      process:
      UID        PID  PPID  C STIME TTY          TIME CMD
      root     23667 23663  0 09:40 ?        00:00:00 /usr/bin/sudo su ambari-qa
      -l -s /bin/bash -c export
      PATH='/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/a
      mbari-server/*:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
      :/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr
      /lib/hive/bin/:/usr/sbin/
      ' ; hive --hiveconf
      hive.metastore.uris=thrift://hdp2-02-02:9083 -e 'show
      databases;'
      
      INFO 2015-03-11 09:40:33,433 PingPortListener.py:62 - Ping port listener
      killed
      

      Attachments

        1. AMBARI-10083.patch
          18 kB
          Jonathan Hurley

        Issue Links

          Activity

            People

              jonathanhurley Jonathan Hurley
              jonathanhurley Jonathan Hurley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: