Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-728

Executor does not handle announcer errors properly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 0.6.0
    • None
    • None
    • Aurora Q3 Sprint 3, Aurora Q4 Sprint 1

    Description

      Failures in the announcer lead to mesos and aurora running out of sync.

      Consider the following stacktrace:

      Traceback (most recent call last):
        File "/root/.pex/install/twitter.common.exceptions-0.3.0-py2-none-any.whl.aa74e2e8535b1ea39bf9512cf70dba3e5aea7b1b/twitter.common.exceptions-0.3.0-py2-none-any.whl/twitter/common/exceptions/__init__.py", line 126, in _excepting_run
          self.__real_run(*args, **kw)
        File "/root/.pex/install/twitter.common.concurrent-0.3.0-py2-none-any.whl.3c9a3bf0ac76acff13a6803a37138bc9f18e54c7/twitter.common.concurrent-0.3.0-py2-none-any.whl/twitter/common/concurrent/deferred.py", line 43, in run
          self._closure()
        File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/aurora_executor.py", line 258, in <lambda>
        File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/aurora_executor.py", line 121, in _run
        File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/aurora_executor.py", line 161, in _start_status_manager
        File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/common/announcer.py", line 74, in from_assigned_task
        File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/common/announcer.py", line 100, in make_serverset
        File "/root/.pex/install/kazoo-1.3.1-py2-none-any.whl.261c1cd5b2337063b238f0c52eeed45a1df90891/kazoo-1.3.1-py2-none-any.whl/kazoo/client.py", line 475, in start
          raise self.handler.timeout_exception("Connection time-out")
      kazoo.handlers.threading.TimeoutError: Connection time-out
      

      Current behaviour: The executor dies. Mesos considers the task as RUNNING, whereas aurora will eventually consider the task as LOST.

      Expected behaviour: The executor catches the exception and dispatches TASK_LOST or TASK_FAILED

      Attachments

        Activity

          People

            zmanji Zameer Manji
            StephanErb Stephan Erb
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: