Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1907

Thermos unresponsive on hosts with many active task

    XMLWordPrintableJSON

Details

    • Story
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.18.0
    • Observer
    • None

    Description

      We have noticed that on hosts with lots of active tasks (~100) and many terminated tasks (~1500) the Thermos UI is not usable. Thermos spins at 300% CPU but does not render any HTTP requests.

      Dumping /threads indicates we might be blocked by the hundret TaskResourceMonitor threads trying to read values from /proc:

      # Thread (daemon): TaskResourceMonitor (TaskResourceMonitor[mytask-id] [TID=45241], 140682825963264)
        File: "/usr/lib/python2.7/threading.py", line 525, in __bootstrap
          self.__bootstrap_inner()
        File: "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
          self.run()
        File: "/.pex/install/twitter.common.decorators-0.3.7-py2-none-any.whl.b23f2874a4392741fca582d9e0528c08e0335c68/twitter.common.decorators-0.3.7-py2-none-any.whl/twitter/common/decorators/threads.py", line 115, in identified
          return instancemethod(self, *args, **kwargs)
        File: "/.pex/install/twitter.common.exceptions-0.3.7-py2-none-any.whl.f6376bcca9bfda5eba4396de2676af5dfe36237d/twitter.common.exceptions-0.3.7-py2-none-any.whl/twitter/common/exceptions/__init__.py", line 126, in _excepting_run
          self.__real_run(*args, **kw)
        File: "apache/thermos/monitoring/resource.py", line 204, in run
          collector.sample()
        File: "apache/thermos/monitoring/process_collector_psutil.py", line 70, in sample
          for child in parent.children(recursive=True)
        File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/__init__.py", line 326, in wrapper
          return fun(self, *args, **kwargs)
        File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/__init__.py", line 861, in children
          table[p.ppid()].append(p)
        File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/__init__.py", line 545, in ppid
          return self._proc.ppid()
        File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/_pslinux.py", line 962, in wrapper
          return fun(self, *args, **kwargs)
        File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/_pslinux.py", line 1459, in ppid
          return int(self._parse_stat_file()[2])
        File: "/.pex/install/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl.f4f23a781c020a8b8cb5cba2da0161d0db6452b1/psutil-4.3.0-cp27-cp27mu-linux_x86_64.whl/psutil/_pslinux.py", line 1001, in _parse_stat_file
          return [name] + fields_after_name
      

      Attachments

        Activity

          People

            StephanErb Stephan Erb
            StephanErb Stephan Erb
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: