Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-7800

Tasks with many labels can cause disproportionally huge allocations

    Details

    • Type: Bug
    • Status: Accepted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: agent, master

      Description

      mesos.proto provides the Labels message so others can add free-form data to a number of messages. In e.g., TaskInfo and ExecutorInfo we explicitly document

      Therefore, labels should be used to tag tasks with light-weight meta-data.

      We however never enforce this requirement.

      This becomes e.g., problematic in the agent where a TaskInfo will likely be copied often, e.g., due to multiple levels of dispatches. I have measured that a single Label can trigger 50-100 concurrent copies in flight on the agent's container launch path; our general assumption here seems to be that while a TaskInfo is not necessarily small, it still is not huge.

      If users embed a lot of data into e.g., TaskInfo labels this can lead to a temporary explosion of the agent process' memory footprint which can lead to it being killed by the OS.

      Due to the potential negative effects of huge labels we should evaluate how we can limit the amount of data we accept from users. This could mean limiting the size of TaskInfo or Labels we accept, measured e.g., by the message's ByteSizeLong. It seems that a value somehow related to ARG_MAX would be intuitive, but am not sure if we can go as low as the POSIX-mandated minimum requirement of 4096.

      1. stat_all_task_labels.dat
        0.8 kB
        Benjamin Bannier
      2. stat_individual_labels.dat
        3 kB
        Benjamin Bannier

        Issue Links

          Activity

          Hide
          bbannier Benjamin Bannier added a comment -

          I went through a couple of sample workloads, and am attaching files with pairs of the length of the key and value, respectively. One file contains the an entry for each Label used (stat_individual_labels.dat); the other file contains entries accumulating the sizes of all keys or values by task.

          Looking at the values it seems there are two groups of workloads here, one where all Label contents should fit well into 0.5 kB uncompressed, while the other group seems to need around 16kB. While the first group clearly only passes lightweight data as documented, the second group passes encoded data payloads.

          Show
          bbannier Benjamin Bannier added a comment - I went through a couple of sample workloads, and am attaching files with pairs of the length of the key and value, respectively. One file contains the an entry for each Label used ( stat_individual_labels.dat ); the other file contains entries accumulating the sizes of all keys or values by task. Looking at the values it seems there are two groups of workloads here, one where all Label contents should fit well into 0.5 kB uncompressed, while the other group seems to need around 16kB. While the first group clearly only passes lightweight data as documented, the second group passes encoded data payloads.

            People

            • Assignee:
              Unassigned
              Reporter:
              bbannier Benjamin Bannier
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development