Uploaded image for project: 'Traffic Server'
  1. Traffic Server
  2. TS-1487

the ordering of plugin_init and init_HttpProxyServer cause crashed TS to core endlessly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 3.2.0
    • 3.3.5
    • Core
    • Linux RHEL6.2

    Description

      We've had a serious issue whereby the TS when it crashes re-spawns/cores continuously when its tries to re-start under load. I traced the issue to SNMP research library (a third party lib)- They use selects and what happens is the file descriptor number spikes under load after the crash as all the sockets get opened at once - this causes buffer overflow in the select (which their library is full of) as the fd allocated to the FD_SET is much bigger than the FD_SETSIZE of 1024 (which was a bitch to track down as the stack was corrupted and gdb therefore useless). Tracing why this happened on 3.2.0 and not 3.0.2, I find the sequence
      of the plugin_init has changed - On 3.0.2 the sequence was in effect 1. plugin_init and then 2. init_HttpProxyServer. Whereas this has mysteriously been reversed on 3.2.0. In order to get our system to work in this crash case , I've patched ATS to flip them around like in 3.0.2.
      i'll attach the patch we propose we need to use to get around this.

      Is this actually a bug then waiting to happen in other systems - Or was there a reason to change this sequence?

      Attachments

        1. INTD-529-RespawnCrash.patch
          3 kB
          Aidan McGurn
        2. INTD-529-RespawnCrash.patch
          2 kB
          Aidan McGurn
        3. ts-1487.diff
          30 kB
          Alan M. Carroll

        Issue Links

          Activity

            People

              amc Alan M. Carroll
              amcgurn Aidan McGurn
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: