Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1614

Failed sandbox initialization can cause tasks to go LOST

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 0.13.0
    • Executor
    • None
    • Twitter Aurora Q1'16 Sprint 18
    • 3

    Description

      When we initialize the sandbox, we only catch Sandbox specific error types, meaning that if an unexpected error is raised, the executor just hangs until the timeout is exceeded, at which point the task goes lost.

      We should instead broadly catch exceptions raised during sandbox initialization and quickly fail tasks.

      Additionally, the DockerDirectorySandbox was not properly catching errors raised when creating/symlinking which led to the above problem in the event of a misconfiguration. In practice this issue shouldn't have occurred in normal usage, but it made development slow until I tracked down what was causing the tasks to just hang.

      Attachments

        Activity

          People

            joshua.cohen Joshua Cohen
            joshua.cohen Joshua Cohen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: