Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-2141

Inconsistent euid cause bad mgmtapi/eventapi sockets

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.1.2
    • Management API, Manager
    • None

    Description

      In the main function of traffic_manager, webIntr_main thread is created after euid has been changed to "proxy.config.admin.user_id".

      And then, webIntr_main thread will create mgmtapisocket/eventapisocket socket fd, and listen on them.

      But unfortunately, after created webIntr_main thread, the main thread will call listenForProxy()->bindProxyPort() immediately, which might change/restore euid concurrently.

      For example:
      1) bindProxyPort() change the euid to 0(root uid).
      2) webIntr_main creates mgmtapisocket with 0 euid.
      3) bindProxyPort() restore the euid to 501(admin.user_id)
      4) webIntr_main calls chmod(0777) on mgmtapisocket, it'll get "Operation not permitted" error and other unexpected errors.

      As the api sockets can't create correctly, traffic_cop will failed to establish heartbeat with traffic_manager, and than traffic_cop will kill traffic_manager automatically.

      But more worse, after killed and restarted manager, traffic_cop forgets to reconnect to manager, so it will use bad main_socket_fd to detect heartbeat of manager. Of course, it'll faill to detect and than kill/restart manager again and again ...:

      [E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
      [Aug 15 18:09:20.730] {0x7ffb873aa7e0} STATUS: opened /var/log/trafficserver/manager.log
      [Aug 15 18:09:20.730] {0x7ffb873aa7e0} NOTE: updated diags config
      [Aug 15 18:09:20.732] Manager {0x7ffb873aa7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-220.17.1.tb619.el6.x86_6'
      [Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 8080
      [Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80
      [Aug 15 18:09:20.733] Manager {0x7ffb85d30700} NOTE: [newUNIXsocket] Unable to chmod unix-domain socket: Operation not permitted
      [Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 81
      [Aug 15 18:09:20.733] Manager {0x7ffb85d30700} NOTE: [WebIntrMain] Unable to set up so for handling managment API event calls. Event Socket path: /var/run/tr>
      [Aug 15 18:09:20.733] Manager {0x7ffb873aa7e0} NOTE: [TrafficManager] Setup complete
      [Aug 15 18:09:21.751] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::startProxy] Launching ts process
      [TrafficServer] using root directory '/usr'
      [Aug 15 18:09:21.760] Manager {0x7ffb873aa7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '13'
      [Aug 15 18:09:21.760] Manager {0x7ffb873aa7e0} NOTE: [Alarms::signalAlarm] Server Process born
      [Aug 15 18:09:22.775] {0x2b4218293740} STATUS: opened /var/log/trafficserver/diags.log
      [Aug 15 18:09:23.171] {0x2b4218293740} NOTE: updated diags config
      [Aug 15 18:09:23.175] Server {0x2b4218293740} NOTE: cache clustering disabled
      [Aug 15 18:09:23.176] Server {0x2b4218293740} WARNING: no ssd disks specified in proxy.config.cache.ssd.storage: 
      [Aug 15 18:09:23.202] Server {0x2b4218293740} NOTE: cache clustering disabled
      [Aug 15 18:09:23.202] Server {0x2b4218293740} WARNING: unable to open cache disk(s): SSD Cache Disabled
      [Aug 15 18:09:23.214] Server {0x2b4218293740} NOTE: logging initialized[15], logging_mode = 3
      [Aug 15 18:09:23.215] Server {0x2b4218293740} WARNING: Access logging to local log directory suspended - partition space is low.
      [Aug 15 18:09:23.248] Server {0x2b4218293740} NOTE: traffic server running
      [Aug 15 18:09:39.200] Server {0x2b421a620700} NOTE: cache enabled
      FATAL ==> [ProcessManager::pollLMConnection] Lost Manager EOF![E. Mgmt] : Interrupted system call
      [E. Mgmt] log ==> [TrafficManager] using root directory '/usr'
      [Aug 15 18:09:52.744] {0x7f918588f7e0} STATUS: opened /var/log/trafficserver/manager.log
      [Aug 15 18:09:52.744] {0x7f918588f7e0} NOTE: updated diags config
      [Aug 15 18:09:52.746] Manager {0x7f918588f7e0} NOTE: [ClusterCom::ClusterCom] Node running on OS: 'Linux' Release: '2.6.32-220.17.1.tb619.el6.x86_6'
      [Aug 15 18:09:52.746] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 8080
      [Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 80
      [Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [LocalManager::listenForProxy] Listening on port: 81
      [Aug 15 18:09:52.747] Manager {0x7f918588f7e0} NOTE: [TrafficManager] Setup complete
      [Aug 15 18:09:53.765] Manager {0x7f918588f7e0} NOTE: [LocalManager::startProxy] Launching ts process
      [TrafficServer] using root directory '/usr'
      [Aug 15 18:09:53.774] Manager {0x7f918588f7e0} NOTE: [LocalManager::pollMgmtProcessServer] New process connecting fd '12'
      [Aug 15 18:09:53.774] Manager {0x7f918588f7e0} NOTE: [Alarms::signalAlarm] Server Process born
      [Aug 15 18:09:54.788] {0x2acfdbb59740} STATUS: opened /var/log/trafficserver/diags.log
      [Aug 15 18:09:54.789] {0x2acfdbb59740} NOTE: updated diags config
      [Aug 15 18:09:54.792] Server {0x2acfdbb59740} NOTE: cache clustering disabled
      [Aug 15 18:09:54.794] Server {0x2acfdbb59740} WARNING: no ssd disks specified in proxy.config.cache.ssd.storage: 
      [Aug 15 18:09:54.820] Server {0x2acfdbb59740} NOTE: cache clustering disabled
      [Aug 15 18:09:54.820] Server {0x2acfdbb59740} WARNING: unable to open cache disk(s): SSD Cache Disabled
      [Aug 15 18:09:54.833] Server {0x2acfdbb59740} NOTE: logging initialized[15], logging_mode = 3
      [Aug 15 18:09:54.833] Server {0x2acfdbb59740} WARNING: Access logging to local log directory suspended - partition space is low.
      [Aug 15 18:09:54.868] Server {0x2acfdbb59740} NOTE: traffic server running
      [Aug 15 18:10:10.729] Server {0x2acfe1f1e700} NOTE: cache enabled
      FATAL ==> [ProcessManager::pollLMConnection] Lost Manager EOF![E. Mgmt] : Interrupted system call
      ....
      

      Attachments

        Issue Links

          Activity

            People

              yunkai Yunkai Zhang
              yunkai Yunkai Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: