Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1459

DelayExecutor is flaky within scheduling loop

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.10.0
    • Scheduler
    • None
    • Twitter Aurora Q3'15 Sprint 11
    • 5

    Description

      TaskGroups now uses DelayExecutor introduced to gate async operations. The problem though is that DelayExecutor queue is only flushed on DB transaction completion (1). This means no scheduling can ever proceed unless there is some storage mutation activity. If/when there are no storage writes scheduling effectively halts.

      While it unlikely to happen in production, it is consistently reproducible with e2e tests in vagrant on any subsequent run.

      (1) - https://github.com/apache/aurora/blob/06ddaadbcba4c66b8019815de6ca27d50a9df77d/src/main/java/org/apache/aurora/scheduler/storage/db/DbStorage.java#L175-L178

      Attachments

        Activity

          People

            wfarner Bill Farner
            maximk Maxim Khutornenko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: