Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7800

Tasks with many labels can cause disproportionally huge allocations

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Accepted
    • Major
    • Resolution: Unresolved
    • None
    • None
    • agent, master

    Description

      mesos.proto provides the Labels message so others can add free-form data to a number of messages. In e.g., TaskInfo and ExecutorInfo we explicitly document

      Therefore, labels should be used to tag tasks with light-weight meta-data.

      We however never enforce this requirement.

      This becomes e.g., problematic in the agent where a TaskInfo will likely be copied often, e.g., due to multiple levels of dispatches. I have measured that a single Label can trigger 50-100 concurrent copies in flight on the agent's container launch path; our general assumption here seems to be that while a TaskInfo is not necessarily small, it still is not huge.

      If users embed a lot of data into e.g., TaskInfo labels this can lead to a temporary explosion of the agent process' memory footprint which can lead to it being killed by the OS.

      Due to the potential negative effects of huge labels we should evaluate how we can limit the amount of data we accept from users. This could mean limiting the size of TaskInfo or Labels we accept, measured e.g., by the message's ByteSizeLong. It seems that a value somehow related to ARG_MAX would be intuitive, but am not sure if we can go as low as the POSIX-mandated minimum requirement of 4096.

      Attachments

        1. stat_individual_labels.dat
          3 kB
          Benjamin Bannier
        2. stat_all_task_labels.dat
          0.8 kB
          Benjamin Bannier

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bbannier Benjamin Bannier
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: