Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-302

TaskGroups may abandon tasks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • Scheduler
    • None

    Description

      I've yet to figure out exactly how this happens, but i've witnessed this twice successively in vagrant (but was unable to repro while trying to debug it), and once in production.

      TaskGroups appears to have a bug that causes it to keep a group in the groups data structure, but with no corresponding async task in executor. The design of TaskGroups is such that each task group must ~always be represented in both (almost always because the executor entry will be absent briefly while trying to schedule a task).

      The one i observed in production looked like this (in /pendingtasks):

      {
      penaltyMs: 30000,
      name: "role/env/job",
      taskIds: [ ]
      },
      

      When i saw it in vagrant:

      {
      penaltyMs: 1,
      name: "role/env/job",
      taskIds: [ ]
      },
      

      Additionally, the schedule_queue_size in vagrant was consistently zero when i observed this, further supporting the hypothesis that the group was not being evaluated.

      TaskGroups is intended to invalidate empty groups, so the mere presence of one suggests that it has been dropped.

      Attachments

        Activity

          People

            wfarner Bill Farner
            wfarner Bill Farner
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: