Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-4975

mesos::internal::master::Slave::tasks can grow unboundedly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.0.2, 1.1.0, 1.2.0
    • master
    • None

    Description

      So in a Mesos cluster we observed the following

      $ jq '.orphan_tasks | length' state.json
      1369
      $ jq '.unregistered_frameworks | length' state.json
      20162
      

      Aside from unregistered_frameworks here being "the list of frameworkIDs for each orphan task" (described in MESOS-4973), the discrepancy between the two values above is surprising.

      I think the problem is that we do this in the master:

      From source:

          foreachvalue (Slave* slave, slaves.registered) {
            foreachvalue (Task* task, slave->tasks[framework->id()]) {
              framework->addTask(task);
            }
            foreachvalue (const ExecutorInfo& executor,
                          slave->executors[framework->id()]) {
              framework->addExecutor(slave->id, executor);
            }
          }
      

      Here an operator[] is used whenever a framework subscribes regardless of whether this agent has tasks for the framework or not.

      If the agent has no such task for this framework, then this {frameworkID: empty hashmap} entry will stay in the map indefinitely! If frameworks are ephemeral and new ones keep come in, the map grows unboundedly.

      We should do tasks.contains(frameworkId) before using the [] operator.

      Attachments

        Issue Links

          Activity

            People

              xujyan Yan Xu
              xujyan Yan Xu
              Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: