Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-4161

ProcessManager prone to stack-overflow

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.2.0
    • Component/s: Manager
    • Labels:

      Description

      ProcessManager::pollLMConnection() can get "stuck" in a loop while handling big number of messages in a row from the same socket.

      Since alloca() is used to allocate buffers on the stack for each message read from the socket, and those buffers are not released until the function returns, getting "stuck" in the loop can lead to stack-overflow, fwiw same could happen if the message length is big enough (accidentally or on purpose).

      It can be reproduced easily by setting up:
      proxy.config.lm.pserver_timeout_secs: 0
      proxy.config.lm.pserver_timeout_msecs: 0
      in records.config and running ./bin/traffic_manager.

      ATS crashes with a segfault in a weird place (while trying to allocate with malloc()). If you inspect the core you would see that it got "stuck" in the loop before it crashed over-flowing the stack (kept allocating buffers on the stack with alloca() until it crashed).

      It is worth considering replacing the alloca() with VLA (which "releases" memory when out of scope on each iteration of the loop) or using ats_malloc() which is supposedly less time-efficient but would be better to handle bigger messages without worrying about stack-overflow.

      IMO adding a message size limit check is a good practice especially with the current implementation.

      If the code gets "stuck" in the while loop while reading big number of messages in a row from the same socket then the port configured by proxy.config.process_manager.mgmt_port becomes unavailable (connection refused). Adding a limit of messages that can be processed in a row should be a good idea.

      I stumbled up on this while running TSQA regression tests where TSQA kept complaining that the management port is not available and the ATS kept crashing.

        Attachments

          Activity

            People

            • Assignee:
              gancho Gancho Tenev
              Reporter:
              gancho Gancho Tenev
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: