Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-3307

Configurable size of completed task / framework history

    Details

    • Sprint:
      Mesosphere Sprint 26
    • Story Points:
      3

      Description

      We try to make Mesos work with multiple frameworks and mesos-dns at the same time. The goal is to have set of frameworks per team / project on a single Mesos cluster.

      At this point our mesos state.json is at 4mb and it takes a while to assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.

      Here's the problem:

      mesos λ curl -s http://mesos-master:5050/master/state.json | jq .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
         1 "20150606-001827-252388362-5050-5982-0003"
        16 "20150606-001827-252388362-5050-5982-0005"
        18 "20150606-001827-252388362-5050-5982-0029"
        73 "20150606-001827-252388362-5050-5982-0007"
       141 "20150606-001827-252388362-5050-5982-0009"
       154 "20150820-154817-302720010-5050-15320-0000"
       289 "20150606-001827-252388362-5050-5982-0004"
       510 "20150606-001827-252388362-5050-5982-0012"
       666 "20150606-001827-252388362-5050-5982-0028"
       923 "20150116-002612-269165578-5050-32204-0003"
      1000 "20150606-001827-252388362-5050-5982-0001"
      1000 "20150606-001827-252388362-5050-5982-0006"
      1000 "20150606-001827-252388362-5050-5982-0010"
      1000 "20150606-001827-252388362-5050-5982-0011"
      1000 "20150606-001827-252388362-5050-5982-0027"
      
      mesos λ fgrep 1000 -r src/master
      src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 100000;
      src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 1000;
      

      Active tasks are just 6% of state.json response:

      mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
             1   14796 4138942
      mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
            16      37  252774
      

      I see four options that can improve the situation:

      1. Add query string param to exclude completed tasks from state.json and use it in mesos-dns and similar tools. There is no need for mesos-dns to know about completed tasks, it's just extra load on master and mesos-dns.

      2. Make history size configurable.

      3. Make JSON serialization faster. With 10000s of tasks even without history it would take a lot of time to serialize tasks for mesos-dns. Doing it every 60 seconds instead of every 5 seconds isn't really an option.

      4. Create event bus for mesos master. Marathon has it and it'd be nice to have it in Mesos. This way mesos-dns could avoid polling master state and switch to listening for events.

      All can be done independently.

      Note to mesosphere folks: please start distributing debug symbols with your distribution. I was asking for it for a while and it is really helpful: https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501

      Perf report for leading master:

      I'm on 0.23.0.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jfarrell closed the pull request at:

        https://github.com/apache/mesos/pull/82

        Show
        githubbot ASF GitHub Bot added a comment - Github user jfarrell closed the pull request at: https://github.com/apache/mesos/pull/82
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jfarrell commented on the issue:

        https://github.com/apache/mesos/pull/82

        Closing per request at https://s.apache.org/V8r3

        Show
        githubbot ASF GitHub Bot added a comment - Github user jfarrell commented on the issue: https://github.com/apache/mesos/pull/82 Closing per request at https://s.apache.org/V8r3
        Hide
        alexr Alexander Rukletsov added a comment -

        We currently do not work on the event streaming, hence the JSON endpoint is the best you can get now. I think adding filters to the endpoint is a good idea.

        Show
        alexr Alexander Rukletsov added a comment - We currently do not work on the event streaming, hence the JSON endpoint is the best you can get now. I think adding filters to the endpoint is a good idea.
        Hide
        tpolekhin Tymofii added a comment -

        Can we get the confirmation from the guys who's actually working on this?
        "However, in the long term things like mesos-dns should use the "Mesos Master Event Streaming" API that Alexander Rukletsov and others are working once it is completed. This will make bandaid solutions like this one unnecessary."

        Show
        tpolekhin Tymofii added a comment - Can we get the confirmation from the guys who's actually working on this? "However, in the long term things like mesos-dns should use the "Mesos Master Event Streaming" API that Alexander Rukletsov and others are working once it is completed. This will make bandaid solutions like this one unnecessary."
        Hide
        klueska Kevin Klues added a comment -

        I'm all for query parameters to filter this stuff, but others seem to disagree. (See the thread above).

        Show
        klueska Kevin Klues added a comment - I'm all for query parameters to filter this stuff, but others seem to disagree. (See the thread above).
        Hide
        tpolekhin Tymofii added a comment -

        Yes, it generates JSON much faster now, but we still having lots and lots completed tasks and frameworks there, which we don't care about for service discovery, but want to keep them for history.
        Wouldn't it be great to have some basic filtering for /state endpoint to get only active tasks/frameworks, only tasks or particular framework, only slaves information etc.?
        /state-summary endpoint introduced recently doesn't fit service discovery requirements.

        Show
        tpolekhin Tymofii added a comment - Yes, it generates JSON much faster now, but we still having lots and lots completed tasks and frameworks there, which we don't care about for service discovery, but want to keep them for history. Wouldn't it be great to have some basic filtering for /state endpoint to get only active tasks/frameworks, only tasks or particular framework, only slaves information etc.? /state-summary endpoint introduced recently doesn't fit service discovery requirements.
        Hide
        bmahler Benjamin Mahler added a comment -

        I did some searching but couldn't find one. Note that streaming state information has become less urgent now that the json performance fixes were addressed: MESOS-2353

        Show
        bmahler Benjamin Mahler added a comment - I did some searching but couldn't find one. Note that streaming state information has become less urgent now that the json performance fixes were addressed: MESOS-2353
        Hide
        tpolekhin Tymofii added a comment -

        Hello

        Is there an issue or epic for Mesos Event Streaming HTTP Endpoint?

        Show
        tpolekhin Tymofii added a comment - Hello Is there an issue or epic for Mesos Event Streaming HTTP Endpoint?
        Hide
        bmahler Benjamin Mahler added a comment -

        Sounds like there are some additional concerns around state.json that would be great to discuss in the mailing list if folks are keen. We're speeding up the json generation significantly in MESOS-2353 which is likely to be where the bigger benefit is seen (nice to not have to change flags as well). For now, users are free to configure the size of history if they please:

        commit b843afa130c321747a70da2fec9ea3cedaf34c1c
        Author: Kevin Klues <klueska@gmail.com>
        Date:   Thu Jan 14 22:55:54 2016 -0800
        
            Added flags to set size of completed task/framework history.
        
            The default size of the buffers used to hold the state of completed
            tasks/frameworks is very large. However, many users don't care much
            about this information when requesting a master's state. Moreover, if a
            large number of frameworks request this state simultaneously, the
            master can quickly become overwhelmed because the process of generating
            this state both blocks the master and takes up a lot of cycles. By
            allowing the user to configure the size of the buffers used to hold
            this state, we let the user decide how much state is needed.
        
            This commit is based on a pull request generated by Felix Bechstein at:
            https://github.com/apache/mesos/pull/82
        
            Review: https://reviews.apache.org/r/42053/
        
        commit f99ae0e7618b8a6508ff7d97c290572592508737
        Author: Kevin Klues <klueska@gmail.com>
        Date:   Thu Jan 14 22:57:11 2016 -0800
        
            Added unit test for framework/task history flags.
        
            This commit adds tests to verify that the the max_frameworks and
            max_tasks_per_frameworks flags for master work properly. Specifically,
            we test to verify that the proper amount of history is maintained for
            both 0 values to these flags as well as positive values <= to the total
            number of frameworks and tasks per framework actually launched.
        
            Review: https://reviews.apache.org/r/42212/
        
        Show
        bmahler Benjamin Mahler added a comment - Sounds like there are some additional concerns around state.json that would be great to discuss in the mailing list if folks are keen. We're speeding up the json generation significantly in MESOS-2353 which is likely to be where the bigger benefit is seen (nice to not have to change flags as well). For now, users are free to configure the size of history if they please: commit b843afa130c321747a70da2fec9ea3cedaf34c1c Author: Kevin Klues <klueska@gmail.com> Date: Thu Jan 14 22:55:54 2016 -0800 Added flags to set size of completed task/framework history. The default size of the buffers used to hold the state of completed tasks/frameworks is very large. However, many users don't care much about this information when requesting a master's state. Moreover, if a large number of frameworks request this state simultaneously, the master can quickly become overwhelmed because the process of generating this state both blocks the master and takes up a lot of cycles. By allowing the user to configure the size of the buffers used to hold this state, we let the user decide how much state is needed. This commit is based on a pull request generated by Felix Bechstein at: https://github.com/apache/mesos/pull/82 Review: https://reviews.apache.org/r/42053/ commit f99ae0e7618b8a6508ff7d97c290572592508737 Author: Kevin Klues <klueska@gmail.com> Date: Thu Jan 14 22:57:11 2016 -0800 Added unit test for framework/task history flags. This commit adds tests to verify that the the max_frameworks and max_tasks_per_frameworks flags for master work properly. Specifically, we test to verify that the proper amount of history is maintained for both 0 values to these flags as well as positive values <= to the total number of frameworks and tasks per framework actually launched. Review: https://reviews.apache.org/r/42212/
        Hide
        klueska Kevin Klues added a comment -
        Show
        klueska Kevin Klues added a comment - Here are the reviews out for this: https://reviews.apache.org/r/42053/ https://reviews.apache.org/r/42212/
        Hide
        bobrik Ivan Babrou added a comment -

        Having API params to fetch only interesting tasks would be very nice. Mesos DNS and similar tools don't care about the size of completed task history, it only cares about alive tasks. Many tools also only care about tasks with certain labels and/or ports allocated.

        Having mesos even bus similar to marathon's even bus would eliminate the need to do active polling altogether, but that takes time (is there an issue for this, btw?).

        I'm okay with having flags for history size, though, since that's what I use now.

        Show
        bobrik Ivan Babrou added a comment - Having API params to fetch only interesting tasks would be very nice. Mesos DNS and similar tools don't care about the size of completed task history, it only cares about alive tasks. Many tools also only care about tasks with certain labels and/or ports allocated. Having mesos even bus similar to marathon's even bus would eliminate the need to do active polling altogether, but that takes time (is there an issue for this, btw?). I'm okay with having flags for history size, though, since that's what I use now .
        Hide
        klueska Kevin Klues added a comment -

        I have submitted a patch for review based on Felix's pull request (with some modifications):
        https://reviews.apache.org/r/42053/

        This patch adds configure flags for setting the buffer size of the completed frameworks and tasks_per_framework variables for the state.json (and related) endpoints. This combined with MESOS-2353 for significantly reducing the time it takes to generate state.json should resolve the ticket addressed here. However, in the long term things like mesos-dns should use the "Mesos Master Event Streaming" API that Alexander Rukletsov and others are working once it is completed. This will make bandaid solutions like this one unnecessary.

        Also, keep in mind, the use of these newly introduced flags will only help if you are in charge of running your master configuration. If you are using something like the Mesosphere DCOS to automatically set up your master/agent configuration, then these flags will likely not be of much help because their default values will remain as they were before.

        Show
        klueska Kevin Klues added a comment - I have submitted a patch for review based on Felix's pull request (with some modifications): https://reviews.apache.org/r/42053/ This patch adds configure flags for setting the buffer size of the completed frameworks and tasks_per_framework variables for the state.json (and related) endpoints. This combined with MESOS-2353 for significantly reducing the time it takes to generate state.json should resolve the ticket addressed here. However, in the long term things like mesos-dns should use the "Mesos Master Event Streaming" API that Alexander Rukletsov and others are working once it is completed. This will make bandaid solutions like this one unnecessary. Also, keep in mind, the use of these newly introduced flags will only help if you are in charge of running your master configuration. If you are using something like the Mesosphere DCOS to automatically set up your master/agent configuration, then these flags will likely not be of much help because their default values will remain as they were before.
        Hide
        klueska Kevin Klues added a comment -

        Yeah, me and Ben Mahler just chatted about it and decided the same. I am working on fixing up Felix's pull request to adhere to our code standards / pass through review board and will push it through later this afternoon.

        Show
        klueska Kevin Klues added a comment - Yeah, me and Ben Mahler just chatted about it and decided the same. I am working on fixing up Felix's pull request to adhere to our code standards / pass through review board and will push it through later this afternoon.
        Hide
        adam-mesos Adam B added a comment -

        Please note that each agent tracks its own list of completed tasks/frameworks, and does not poll the master's state.json.
        And I wouldn't worry about adding more flags to master, since those are usually managed by config files or scripts, and each flag should have a sane default so most users don't need to change them.

        Since the only consumers of this data are CLI clients and other scripts, the real question is whether to limit the length of the lists in-memory in the master, which could save some memory, but affects all clients equally; or do the filtering per-request (query-style), which consumes more memory, takes extra time to filter when generating (the already-slow) state.json, but allows each client to control how long of a history, if any, it wants from its request.
        I'm in favor of the simple cmd-line flag for now, and we can consider more intelligent query filtering as a separate issue, namely MESOS-2258

        Show
        adam-mesos Adam B added a comment - Please note that each agent tracks its own list of completed tasks/frameworks, and does not poll the master's state.json. And I wouldn't worry about adding more flags to master, since those are usually managed by config files or scripts, and each flag should have a sane default so most users don't need to change them. Since the only consumers of this data are CLI clients and other scripts, the real question is whether to limit the length of the lists in-memory in the master, which could save some memory, but affects all clients equally; or do the filtering per-request (query-style), which consumes more memory, takes extra time to filter when generating (the already-slow) state.json, but allows each client to control how long of a history, if any, it wants from its request. I'm in favor of the simple cmd-line flag for now, and we can consider more intelligent query filtering as a separate issue, namely MESOS-2258
        Hide
        klueska Kevin Klues added a comment -

        That is a very easy fix, and a pull request to do that is already embedded in this JIRA thread). Although this makes it easy to change these values when master is first brought online, it has the obvious disadvantage of limiting these values for all agents – even those that expect the current values to be fairly high so they can poll state.json fairly infrequently and not miss anything. It also has the disadvantage of expanding the already huge number of flags available to master. Maybe neither of these things matter much though.

        Regarding the license, the only problematic one is the Lucent License:
        https://github.com/stedolan/jq/blob/master/COPYING

        However, I happen to know David Gay pretty well, and I'm sure he'd be happy to dual license the two small files referenced here (though I'd still need to ask him of course).

        Show
        klueska Kevin Klues added a comment - That is a very easy fix, and a pull request to do that is already embedded in this JIRA thread). Although this makes it easy to change these values when master is first brought online, it has the obvious disadvantage of limiting these values for all agents – even those that expect the current values to be fairly high so they can poll state.json fairly infrequently and not miss anything. It also has the disadvantage of expanding the already huge number of flags available to master. Maybe neither of these things matter much though. Regarding the license, the only problematic one is the Lucent License: https://github.com/stedolan/jq/blob/master/COPYING However, I happen to know David Gay pretty well, and I'm sure he'd be happy to dual license the two small files referenced here (though I'd still need to ask him of course).
        Hide
        adam-mesos Adam B added a comment -

        We've talked about adding query parameters to http endpoints before, but there's a bit of inconsistency in whether to use `url?foo=bar` parameters, or embed json in the request body. I think we have both.
        Including external code into Apache Mesos also has the additional constraint that its licensing must be Apache2-compatible.

        If that becomes too complicated, we could just make the currently hardcoded constants configurable via a command-line flag on the master, much like `slave_ping_timeout`. Seems like an easy MVP.

        Show
        adam-mesos Adam B added a comment - We've talked about adding query parameters to http endpoints before, but there's a bit of inconsistency in whether to use `url?foo=bar` parameters, or embed json in the request body. I think we have both. Including external code into Apache Mesos also has the additional constraint that its licensing must be Apache2-compatible. If that becomes too complicated, we could just make the currently hardcoded constants configurable via a command-line flag on the master, much like `slave_ping_timeout`. Seems like an easy MVP.
        Hide
        klueska Kevin Klues added a comment -

        What about building something like 'jq' into the mesos master and allowing one to do the following via a query parameter:

        curl -s http://mesos-master:5050/master/state.json?jq=.frameworks[].completed_tasks[].framework_id

        As far as I understand it, jq's query language is fairly expressive and is written in portable C, so pulling it into mesos shouldn't be that hard.

        Show
        klueska Kevin Klues added a comment - What about building something like 'jq' into the mesos master and allowing one to do the following via a query parameter: curl -s http://mesos-master:5050/master/state.json?jq=.frameworks[].completed_tasks[].framework_id As far as I understand it, jq's query language is fairly expressive and is written in portable C, so pulling it into mesos shouldn't be that hard.
        Hide
        klueska Kevin Klues added a comment -

        /state-summary is not listed as an endpoint in /help. How is the list of endpoints under /help generated?

        Show
        klueska Kevin Klues added a comment - /state-summary is not listed as an endpoint in /help. How is the list of endpoints under /help generated?
        Hide
        flx Felix Bechstein added a comment -

        I'm watching that issue to. I still feel, that we need a flag to limit the state size.
        We are currently running ~30 frameworks + a lot of spark batch jobs registering new frameworks every now and then. This leads to ~50k completed tasks in the state resulting in a 10MB state.json. Nobody ever reads these very old completed tasks. So why hold them in memory or ship them every few seconds.

        Keep in mind, that web browsers need to parse that json every few seconds to update the UI.

        Show
        flx Felix Bechstein added a comment - I'm watching that issue to. I still feel, that we need a flag to limit the state size. We are currently running ~30 frameworks + a lot of spark batch jobs registering new frameworks every now and then. This leads to ~50k completed tasks in the state resulting in a 10MB state.json. Nobody ever reads these very old completed tasks. So why hold them in memory or ship them every few seconds. Keep in mind, that web browsers need to parse that json every few seconds to update the UI.
        Hide
        neilc Neil Conway added a comment -

        Note that a fix for MESOS-2353 should be imminent – it should make generating state.json much faster. When that lands, would we still want to add an extra configuration parameter?

        Show
        neilc Neil Conway added a comment - Note that a fix for MESOS-2353 should be imminent – it should make generating state.json much faster. When that lands, would we still want to add an extra configuration parameter?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user kaysoky commented on the pull request:

        https://github.com/apache/mesos/pull/82#issuecomment-155853543

        Make sure you read our [guidelines for submitting patches](http://mesos.apache.org/documentation/latest/submitting-a-patch/). We only use Pull Requests for changes to the website. Code changes are reviewed on ReviewBoard.

        Show
        githubbot ASF GitHub Bot added a comment - Github user kaysoky commented on the pull request: https://github.com/apache/mesos/pull/82#issuecomment-155853543 Make sure you read our [guidelines for submitting patches] ( http://mesos.apache.org/documentation/latest/submitting-a-patch/ ). We only use Pull Requests for changes to the website. Code changes are reviewed on ReviewBoard.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user felixb opened a pull request:

        https://github.com/apache/mesos/pull/82

        MESOS-3307 Configurable size of completed task / framework history

        Running many frameworks makes mesos master becoming very slow.
        A huge state results in mesos-master occupying all of it's CPU just for generating the state.json blocking everything else.

        This change lets users limit the state size.

        refs https://issues.apache.org/jira/browse/MESOS-3307

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/felixb/mesos mesos-3307-limit_task_history

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/mesos/pull/82.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #82


        commit 1d85aee1b1448af30b850bfa76e9d6e1f0414ec1
        Author: Felix Bechstein <felix.bechstein@otto.de>
        Date: 2015-11-11T09:39:10Z

        MESOS-3307 Configurable size of completed task / framework history

        Running many frameworks makes mesos master becoming very slow.
        A huge state results in mesos-master occupying all of it's CPU just for generating the state.json blocking everything else.

        This change lets users limit the state size.


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user felixb opened a pull request: https://github.com/apache/mesos/pull/82 MESOS-3307 Configurable size of completed task / framework history Running many frameworks makes mesos master becoming very slow. A huge state results in mesos-master occupying all of it's CPU just for generating the state.json blocking everything else. This change lets users limit the state size. refs https://issues.apache.org/jira/browse/MESOS-3307 You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixb/mesos mesos-3307-limit_task_history Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mesos/pull/82.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #82 commit 1d85aee1b1448af30b850bfa76e9d6e1f0414ec1 Author: Felix Bechstein <felix.bechstein@otto.de> Date: 2015-11-11T09:39:10Z MESOS-3307 Configurable size of completed task / framework history Running many frameworks makes mesos master becoming very slow. A huge state results in mesos-master occupying all of it's CPU just for generating the state.json blocking everything else. This change lets users limit the state size.
        Hide
        bobrik Ivan Babrou added a comment -

        Another reason to configure history size is mesos rss footprint. We're at 1.8GB right now. With cluster of a similar size in terms of tasks and slaves, but with just 2 frameworks I don't remember such memory usage.

        Show
        bobrik Ivan Babrou added a comment - Another reason to configure history size is mesos rss footprint. We're at 1.8GB right now. With cluster of a similar size in terms of tasks and slaves, but with just 2 frameworks I don't remember such memory usage.
        Hide
        alex-mesos Alexander Rukletsov (Inactive) added a comment -

        Ivan Babrou, you should be able to get the list of endpoints by hitting /help endpoint.

        I think history size is also an option, my feeling is however that we need a more general solution rather than a band-aid. I would also like jmlvanre to chime in.

        Show
        alex-mesos Alexander Rukletsov (Inactive) added a comment - Ivan Babrou , you should be able to get the list of endpoints by hitting /help endpoint. I think history size is also an option, my feeling is however that we need a more general solution rather than a band-aid. I would also like jmlvanre to chime in.
        Hide
        bobrik Ivan Babrou added a comment -

        Alexander Rukletsov is there a list of mesos endpoints? I wasn't able to find one. Having docs for this would be great.

        Any feedback on configurable history size? This is the simplest solution so far.

        Show
        bobrik Ivan Babrou added a comment - Alexander Rukletsov is there a list of mesos endpoints? I wasn't able to find one. Having docs for this would be great. Any feedback on configurable history size? This is the simplest solution so far.
        Hide
        alex-mesos Alexander Rukletsov (Inactive) added a comment -

        > 1. Add query string param to exclude completed tasks from state.json and use it in mesos-dns and similar tools.
        Recently we have added /state-summary endpoint

        > 3. Make JSON serialization faster.
        Right, there is a ticket for that: MESOS-2353.

        > 4. Create event bus for mesos master.
        Again, very good suggestion. We plan to start working on this soon. Here is the first version of the design doc.

        Show
        alex-mesos Alexander Rukletsov (Inactive) added a comment - > 1. Add query string param to exclude completed tasks from state.json and use it in mesos-dns and similar tools. Recently we have added /state-summary endpoint > 3. Make JSON serialization faster. Right, there is a ticket for that: MESOS-2353 . > 4. Create event bus for mesos master. Again, very good suggestion. We plan to start working on this soon. Here is the first version of the design doc .

          People

          • Assignee:
            klueska Kevin Klues
            Reporter:
            bobrik Ivan Babrou
            Shepherd:
            Benjamin Mahler
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development

                Agile