Details

    • Type: Epic
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.24.0
    • Component/s: agent
    • Labels:
    • Epic Name:
      Fetcher Cache
    • Target Version/s:
    • Story Points:
      5

      Description

      The slave should be smarter about how it handles pulling down executors. In our environment, executors rarely change but the slave will always pull it down from regardless HDFS. This puts undue stress on our HDFS clusters, and is not resilient to reduced HDFS availability.

        Issue Links

          Issues in Epic

            Activity

            Hide
            bernd-mesos Bernd Mathiske added a comment -

            By spinning off MESOS-2073 into its own epic, this one can now be regarded as completed.
            (That we are only using two-level hierarchies to break down tickets led to this repositioning.)

            Show
            bernd-mesos Bernd Mathiske added a comment - By spinning off MESOS-2073 into its own epic, this one can now be regarded as completed. (That we are only using two-level hierarchies to break down tickets led to this repositioning.)
            Hide
            bernd-mesos Bernd Mathiske added a comment - - edited

            It is now an epic and there are subtickets that can as far as I estimate be implemented within a sprint or less each, given the available prior work from the first attempt at MESOS-336.

            Show
            bernd-mesos Bernd Mathiske added a comment - - edited It is now an epic and there are subtickets that can as far as I estimate be implemented within a sprint or less each, given the available prior work from the first attempt at MESOS-336 .
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            Here is what happened. Other things had higher priority and so this patch never got full review attention. Meanwhile, pressure to create smaller patches also increased, which further reduced the priority of this one.

            MESOS-1316 and MESOS-1248 were held back, waiting for MESOS-336. All three stalled, a lot has happened and MESOS-336 is yet again harder to rebase. So a change in strategy was proposed by Ben: land MESOS-1316 first, then MESOS-1248, then MESOS-336. I have now created and proposed patches for the former 2 and am starting to rebase, really re-implement, MESOS-336.

            In doing so, I want to create much smaller patches. Furthermore I have devised a concurrency-safe cache eviction scheme that respects space quota. This will add more patches. Hence, it is now time to convert MESOS-336 to an epic and create subordinate tickets.

            Show
            bernd-mesos Bernd Mathiske added a comment - Here is what happened. Other things had higher priority and so this patch never got full review attention. Meanwhile, pressure to create smaller patches also increased, which further reduced the priority of this one. MESOS-1316 and MESOS-1248 were held back, waiting for MESOS-336 . All three stalled, a lot has happened and MESOS-336 is yet again harder to rebase. So a change in strategy was proposed by Ben: land MESOS-1316 first, then MESOS-1248 , then MESOS-336 . I have now created and proposed patches for the former 2 and am starting to rebase, really re-implement, MESOS-336 . In doing so, I want to create much smaller patches. Furthermore I have devised a concurrency-safe cache eviction scheme that respects space quota. This will add more patches. Hence, it is now time to convert MESOS-336 to an epic and create subordinate tickets.
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            I updated the above patch several times and it now seems ready for review.

            Show
            bernd-mesos Bernd Mathiske added a comment - I updated the above patch several times and it now seems ready for review.
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            Getting close to finishing. Have tests running for caching, no caching, GC, extraction, no extraction. Currently working on a test for slave recovery. After that, I will rebase and post another patch, which will then be review-ready.

            Show
            bernd-mesos Bernd Mathiske added a comment - Getting close to finishing. Have tests running for caching, no caching, GC, extraction, no extraction. Currently working on a test for slave recovery. After that, I will rebase and post another patch, which will then be review-ready.
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            Here is the first stage implementation, without thorough testing code:
            https://reviews.apache.org/r/21316/

            Show
            bernd-mesos Bernd Mathiske added a comment - Here is the first stage implementation, without thorough testing code: https://reviews.apache.org/r/21316/
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            Status update:

            • I have an implementation that works (AFAIK) and it looks good (IMHO .
            • I am currently rebasing it to master AFTER external containerizer landed and that takes longer than I expected.
            • Tests need to written. I will post what I have before that, though, to look at.
            • The tests should coincidentally cover most or all of MESOS-1316.
            Show
            bernd-mesos Bernd Mathiske added a comment - Status update: I have an implementation that works (AFAIK) and it looks good (IMHO . I am currently rebasing it to master AFTER external containerizer landed and that takes longer than I expected. Tests need to written. I will post what I have before that, though, to look at. The tests should coincidentally cover most or all of MESOS-1316 .
            Hide
            benjaminhindman Benjamin Hindman added a comment -
            Show
            benjaminhindman Benjamin Hindman added a comment - SGTM Bernd Mathiske .
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            I understand that there is elegance in leaving all aspects of fetching contained in the external fetcher program. Nice separation of concern. But we can have that cake and eat it, too!

            A flag "if asking for caching, let the fetcher program handle it" can indicate that caching should be dealt with externally. I just would not like to write that external cache quite yet, because it does less with more code. We can leave this to third parties. As a first step, I am instead proposing to focus on a caching scheme dealt with by the MesosContainerizerProcess.

            Show
            bernd-mesos Bernd Mathiske added a comment - I understand that there is elegance in leaving all aspects of fetching contained in the external fetcher program. Nice separation of concern. But we can have that cake and eat it, too! A flag "if asking for caching, let the fetcher program handle it" can indicate that caching should be dealt with externally. I just would not like to write that external cache quite yet, because it does less with more code. We can leave this to third parties. As a first step, I am instead proposing to focus on a caching scheme dealt with by the MesosContainerizerProcess.
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            Good idea to encode the checksum in the dirs! I am keen on using this scheme whether we deal with concurrency in the fetcher program or in the slave program. But I have to admit that I am leaning towards the latter. Looking up a given cache file is easier to handle in one coherent process (the slave). Instead of searching on disk, a simple lookup in hashmap yields the whole cache file path, including dir and thus checksum.

            Show
            bernd-mesos Bernd Mathiske added a comment - Good idea to encode the checksum in the dirs! I am keen on using this scheme whether we deal with concurrency in the fetcher program or in the slave program. But I have to admit that I am leaning towards the latter. Looking up a given cache file is easier to handle in one coherent process (the slave). Instead of searching on disk, a simple lookup in hashmap yields the whole cache file path, including dir and thus checksum.
            Hide
            benjaminhindman Benjamin Hindman added a comment -

            Bernd Mathiske, the concurrency issues are indeed a bit tricky. The fetcher is currently invoked by the containerizer which means that coordinating what the fetcher should actually be downloading might be a bigger refactor in the "phase 2" you had mentioned in your posts above. To keep things simple for this first version what about creating a directory for each checksum and then store the files that collide within that directory? For example, if we had the files foo.tar.gz, bar.tar.gz, and baz.tar.gz and the checksums for foo and baz are the same we'd have the directory structure:

            9A32C8943E/foo.tar.gz
            9A32C8943E/baz.tar.gz
            472F293ECA/bar.tar.gz

            This doesn't handle the fact that multiple fetchers may be racing to download the same resource. We could use POSIX file locks here, or punt this for phase 2 and just deal with the inefficiency. What do you think?

            Show
            benjaminhindman Benjamin Hindman added a comment - Bernd Mathiske , the concurrency issues are indeed a bit tricky. The fetcher is currently invoked by the containerizer which means that coordinating what the fetcher should actually be downloading might be a bigger refactor in the "phase 2" you had mentioned in your posts above. To keep things simple for this first version what about creating a directory for each checksum and then store the files that collide within that directory? For example, if we had the files foo.tar.gz, bar.tar.gz, and baz.tar.gz and the checksums for foo and baz are the same we'd have the directory structure: 9A32C8943E/foo.tar.gz 9A32C8943E/baz.tar.gz 472F293ECA/bar.tar.gz This doesn't handle the fact that multiple fetchers may be racing to download the same resource. We could use POSIX file locks here, or punt this for phase 2 and just deal with the inefficiency. What do you think?
            Hide
            dawa009 Davis Wamola added a comment - - edited

            Vinod Kone The issue I was talking about is actually MESOS-700

            Show
            dawa009 Davis Wamola added a comment - - edited Vinod Kone The issue I was talking about is actually MESOS-700
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            The last sentence was written a bit too quickly. There has to be some extra complexity somewhere to deal with concurrency issues. Here is what I have in mind regarding that.

            Problems: There can be multiple mesos-fetcher programs running concurrently. If they all download the same URI, they could trample over the same cache file. Plus it is inefficient to have multiple downloads of the same content running. They won't finish any sooner by THAT sort of work parallelization. Handling such problems in the fetcher program seems to be a bad idea.

            Solution: Prohibit concurrent downloads from to the same URI BEFORE they end up in invocations of the fetcher program. Only one fetch attempt follows through, undisturbed and without having to share its bandwidth. After this single fetch has completed, all waiting parties get notified and proceed by reading from the file cache.

            We can use libprocess futures to organize this inside the slave/containerizer program. If I understand libprocess correctly, we do not even need thread synchronization, a thread-safe data structure or any of that. That's because all the relevant action happens somewhere inside the Slave::runTask method, which is serialized by nature of being installed as a message handler.

            Reading from the file cache can be implemented reusing the same fetcher program as above, simply by rewriting the URI to a local file URI.

            Show
            bernd-mesos Bernd Mathiske added a comment - The last sentence was written a bit too quickly. There has to be some extra complexity somewhere to deal with concurrency issues. Here is what I have in mind regarding that. Problems: There can be multiple mesos-fetcher programs running concurrently. If they all download the same URI, they could trample over the same cache file. Plus it is inefficient to have multiple downloads of the same content running. They won't finish any sooner by THAT sort of work parallelization. Handling such problems in the fetcher program seems to be a bad idea. Solution: Prohibit concurrent downloads from to the same URI BEFORE they end up in invocations of the fetcher program. Only one fetch attempt follows through, undisturbed and without having to share its bandwidth. After this single fetch has completed, all waiting parties get notified and proceed by reading from the file cache. We can use libprocess futures to organize this inside the slave/containerizer program. If I understand libprocess correctly, we do not even need thread synchronization, a thread-safe data structure or any of that. That's because all the relevant action happens somewhere inside the Slave::runTask method, which is serialized by nature of being installed as a message handler. Reading from the file cache can be implemented reusing the same fetcher program as above, simply by rewriting the URI to a local file URI.
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            I think I have all I need for a minimal first patch now. Thanks a lot to you all for filling me in on what your objectives are for this ticket. We are focusing on what seems to be the most common case: a highly cooperative framework with all URIs firmly in framework possession. Then it should be possible to implement a first tack with a few fetcher program upgrades and minor slave/containerizer changes (fetchDir flag, md5 URI field).

            Show
            bernd-mesos Bernd Mathiske added a comment - I think I have all I need for a minimal first patch now. Thanks a lot to you all for filling me in on what your objectives are for this ticket. We are focusing on what seems to be the most common case: a highly cooperative framework with all URIs firmly in framework possession. Then it should be possible to implement a first tack with a few fetcher program upgrades and minor slave/containerizer changes (fetchDir flag, md5 URI field).
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            Good to know that the base name, not the whole URI, defines the identity of a resource. We can build something simple like that and it will still be useful. There will just have to be extra rules for frameworks/users that do not have to be followed when using the current download-always mode.

            Example rule: when using caching resource base names must be uniquely named within the same framework. We do not demand that they be unique across frameworks, though (I hope). So I suggest we make a separate cache per active framework. Otherwise there could be thrashing from multiple like-named resources from different frameworks.

            Thanks for also explaining how the framework gets hold of the checksum! By placing the resource at the URL itself! I was under the (false?) impression that this was not generally the case.

            Show
            bernd-mesos Bernd Mathiske added a comment - Good to know that the base name, not the whole URI, defines the identity of a resource. We can build something simple like that and it will still be useful. There will just have to be extra rules for frameworks/users that do not have to be followed when using the current download-always mode. Example rule: when using caching resource base names must be uniquely named within the same framework. We do not demand that they be unique across frameworks, though (I hope). So I suggest we make a separate cache per active framework. Otherwise there could be thrashing from multiple like-named resources from different frameworks. Thanks for also explaining how the framework gets hold of the checksum! By placing the resource at the URL itself! I was under the (false?) impression that this was not generally the case.
            Hide
            benjaminhindman Benjamin Hindman added a comment -

            I think the miscommunication here might be because (Vinod and) I assume that the URI is identifying a specific resource that we're trying to download. Think of it like doing a wget on a URI, if the URI is 'http://ip1:port1/path/to/resouce.txt' then we expect wget to put a file in our directory called 'resource.txt'. So if later we did a wget on 'ftp://ip2:port2:/different/path/to/resource.txt' then wget would try to overwrite the local 'resource.txt' file in the directory. Thus, the resource is "named" 'resource.txt', so we can easily check "if the requested file exists" and then compare checksums. And yes, the checksum comes from the framework putting it there, as you suggested.

            Clearly there are some URIs that won't have a clear "name" that identifies a resource, for example someone might have the HTTP URL 'http://ip:port/path/to/download?file=resource.txt'. In this case wget would put the result in a file called 'download?file=resource.txt', which isn't a great "name". My expectation right now was to ignore this case. That is, we'll always "name" a resource as wget would. In the end we'll be checking the checksum, and that's most important, even if it does mean we'll be re-download things more often then we like.

            For the specific HTTP case we can add more smarts later, like using ETag as Tobi Knaup suggested, as well as looking at the 'Content-Disposition' of our response (to get better names).

            Note that the checksum still does not become irrelevant if we used the entire URI to name the cache file because the framework might explicitly be trying to say: I've updated the object at this URI so I want you to re-download the new version. This is the expected behavior if you think about something like an HTTP cache. HTTP can exploit the 'If-Modified-Since' headers and we could too, but since some protocols don't have anything analogous we proposed using checksums (i.e., does FTP have any notion of If-Modified-Since? Or HDFS?).

            Show
            benjaminhindman Benjamin Hindman added a comment - I think the miscommunication here might be because (Vinod and) I assume that the URI is identifying a specific resource that we're trying to download. Think of it like doing a wget on a URI, if the URI is 'http://ip1:port1/path/to/resouce.txt' then we expect wget to put a file in our directory called 'resource.txt'. So if later we did a wget on 'ftp://ip2:port2:/different/path/to/resource.txt' then wget would try to overwrite the local 'resource.txt' file in the directory. Thus, the resource is "named" 'resource.txt', so we can easily check "if the requested file exists" and then compare checksums. And yes, the checksum comes from the framework putting it there, as you suggested. Clearly there are some URIs that won't have a clear "name" that identifies a resource, for example someone might have the HTTP URL 'http://ip:port/path/to/download?file=resource.txt'. In this case wget would put the result in a file called 'download?file=resource.txt', which isn't a great "name". My expectation right now was to ignore this case. That is, we'll always "name" a resource as wget would. In the end we'll be checking the checksum, and that's most important, even if it does mean we'll be re-download things more often then we like. For the specific HTTP case we can add more smarts later, like using ETag as Tobi Knaup suggested, as well as looking at the 'Content-Disposition' of our response (to get better names). Note that the checksum still does not become irrelevant if we used the entire URI to name the cache file because the framework might explicitly be trying to say: I've updated the object at this URI so I want you to re-download the new version. This is the expected behavior if you think about something like an HTTP cache. HTTP can exploit the 'If-Modified-Since' headers and we could too, but since some protocols don't have anything analogous we proposed using checksums (i.e., does FTP have any notion of If-Modified-Since? Or HDFS?).
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            @Vinod: In general, we cannot encode a URI in a filename, because filenames have limited length (e.g. 255 chars) and URIs can be longer than that. There would be compression losses that could in principle lead to collisions.

            @Ben: OK, so you would want to trust the probability of collision to be small enough. Fair enough. Maybe use SHA-256 then. But how do you know what the checksum of the second URL in your example actually is without downloading it first? Also, I don't think that loading the same resource from multiple different URIs is an important use case.

            Back to your previous comment above where you wrote " If the requested file exists". What identifies "the requested file"? Primarily, it's the URI, not its contents. Using a checksum flips this: the contents becomes the identity. So if the framework presents the checksum in CommandInfo as described in MESOS-700 instead of a URI, where does that checksum come from? The framework user would have to put it there. Or there would have to be a URI->checksum mapping that is derived from the first download. Here you have two choices. You can persist that mapping and have the fetcher write and read it. Or you can keep it in the slave. I am opting for the latter, because it does not require me to write all that I/O code for the mapping. But then it turns out that once you have any mapping, you might as well map URIs to cache files and the checksum becomes irrelevant...

            This would all be easier if the URIs came with checksums pre-attached. Do they?

            Show
            bernd-mesos Bernd Mathiske added a comment - @Vinod: In general, we cannot encode a URI in a filename, because filenames have limited length (e.g. 255 chars) and URIs can be longer than that. There would be compression losses that could in principle lead to collisions. @Ben: OK, so you would want to trust the probability of collision to be small enough. Fair enough. Maybe use SHA-256 then. But how do you know what the checksum of the second URL in your example actually is without downloading it first? Also, I don't think that loading the same resource from multiple different URIs is an important use case. Back to your previous comment above where you wrote " If the requested file exists". What identifies "the requested file"? Primarily, it's the URI, not its contents. Using a checksum flips this: the contents becomes the identity. So if the framework presents the checksum in CommandInfo as described in MESOS-700 instead of a URI, where does that checksum come from? The framework user would have to put it there. Or there would have to be a URI->checksum mapping that is derived from the first download. Here you have two choices. You can persist that mapping and have the fetcher write and read it. Or you can keep it in the slave. I am opting for the latter, because it does not require me to write all that I/O code for the mapping. But then it turns out that once you have any mapping, you might as well map URIs to cache files and the checksum becomes irrelevant... This would all be easier if the URIs came with checksums pre-attached. Do they?
            Hide
            benjaminhindman Benjamin Hindman added a comment -

            I'm not sure I understand you correctly. If the fetcher is asked to download the URI 'protocol://path/to/foobar-0.4.5.tar.gz' with checksum 'ABC' and later asked to download the URI 'anotherprotocol://some/other/path/to/foobar-0.4.5.tar.gz' with checksum 'ABC' why couldn't (wouldn't) we treat these as the same file? This seems pretty standard, i.e., you might put a file up at location X and then mirror it at location Y, Z, and W and as long as the checksum's match you're confident you got the right file.

            Alternatively, using the entire URI name doesn't seem to be out of the question either, we'd just have to write a mapping from URI to how we put it in the directory cache (i.e., maybe have a directory of protocols first like 'http' then match the path to the file). I'm not sure this is necessary though as I think that the above semantics are OKAY and won't be surprising to users.

            Show
            benjaminhindman Benjamin Hindman added a comment - I'm not sure I understand you correctly. If the fetcher is asked to download the URI 'protocol://path/to/foobar-0.4.5.tar.gz' with checksum 'ABC' and later asked to download the URI 'anotherprotocol://some/other/path/to/foobar-0.4.5.tar.gz' with checksum 'ABC' why couldn't (wouldn't) we treat these as the same file? This seems pretty standard, i.e., you might put a file up at location X and then mirror it at location Y, Z, and W and as long as the checksum's match you're confident you got the right file. Alternatively, using the entire URI name doesn't seem to be out of the question either, we'd just have to write a mapping from URI to how we put it in the directory cache (i.e., maybe have a directory of protocols first like 'http' then match the path to the file). I'm not sure this is necessary though as I think that the above semantics are OKAY and won't be surprising to users.
            Hide
            vinodkone Vinod Kone added a comment -

            Agree with (1) through (4). (5) would only work if a) all URIs had different base names or b) we could simply use whole URIs as cache file names. But we cannot assume either to be the case.

            What's wrong with (b)? We probably need to encode the URI to get the filename if we want to avoid colons and slashes. Is that what you meant?

            Show
            vinodkone Vinod Kone added a comment - Agree with (1) through (4). (5) would only work if a) all URIs had different base names or b) we could simply use whole URIs as cache file names. But we cannot assume either to be the case. What's wrong with (b)? We probably need to encode the URI to get the filename if we want to avoid colons and slashes. Is that what you meant?
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            "download once" just means "do not download again unless evicted"

            Show
            bernd-mesos Bernd Mathiske added a comment - "download once" just means "do not download again unless evicted"
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            Agree with (1) through (4). (5) would only work if a) all URIs had different base names or b) we could simply use whole URIs as cache file names. But we cannot assume either to be the case.

            Show
            bernd-mesos Bernd Mathiske added a comment - Agree with (1) through (4). (5) would only work if a) all URIs had different base names or b) we could simply use whole URIs as cache file names. But we cannot assume either to be the case.
            Hide
            benjaminhindman Benjamin Hindman added a comment -

            Glad to see you working on this Bernd Mathiske!

            First, once we have the checksum capability I'm not convinced we'd want to expose a "download once" capability. In fact, it might be something we wish we didn't expose in the interface (because, for example, what does it mean to download once if, when we have caching, we decide to evict it?).

            Second, I'm not opposed to adding more caching support/functionality in the slave in the future to better manage persistent state but I'm not sure there is much (any?) persistent state we really need for a simple alpha here (aside from the actual downloaded files themselves) unless I'm missing something.

            Here's what I was thinking:

            (1) Add a flag to the slave with a path to a directory which will represent the cache (this lets someone put it in on a RAM FS if they please, but could be defaulted to just be the 'work_dir/cache').
            (2) Add a flag to the slave for the total number of bytes (of URI downloads) to cache (again, defaulted to something reasonable).
            (3) Add the md5/hash in CommandInfo as suggested in MESOS-700.
            (4) Pass the directory ("cache") to the fetcher when it gets invoked.
            (5) Update the fetcher: If the requested file exists and matches the checksum then "touch" it and copy into the sandbox, extract, chmod, etc. If the file doesn't exist, or doesn't match the checksum, download it (overwriting when wrong checksum, we'll "fix" that later), copy into sandbox, extract, chmod, etc.
            (6) Update the fetcher to always run a "cache eviction" before it exits that simply deletes the oldest modified files until we're below the cache limit).

            A lot of the generic caching code that gets written for this could eventually be moved into the slave (since it'll all be C++), but I don't see any reason not to start with it in the fetcher for now.

            How does this sound?

            Show
            benjaminhindman Benjamin Hindman added a comment - Glad to see you working on this Bernd Mathiske ! First, once we have the checksum capability I'm not convinced we'd want to expose a "download once" capability. In fact, it might be something we wish we didn't expose in the interface (because, for example, what does it mean to download once if, when we have caching, we decide to evict it?). Second, I'm not opposed to adding more caching support/functionality in the slave in the future to better manage persistent state but I'm not sure there is much (any?) persistent state we really need for a simple alpha here (aside from the actual downloaded files themselves) unless I'm missing something. Here's what I was thinking: (1) Add a flag to the slave with a path to a directory which will represent the cache (this lets someone put it in on a RAM FS if they please, but could be defaulted to just be the 'work_dir/cache'). (2) Add a flag to the slave for the total number of bytes (of URI downloads) to cache (again, defaulted to something reasonable). (3) Add the md5/hash in CommandInfo as suggested in MESOS-700 . (4) Pass the directory ("cache") to the fetcher when it gets invoked. (5) Update the fetcher: If the requested file exists and matches the checksum then "touch" it and copy into the sandbox, extract, chmod, etc. If the file doesn't exist, or doesn't match the checksum, download it (overwriting when wrong checksum, we'll "fix" that later), copy into sandbox, extract, chmod, etc. (6) Update the fetcher to always run a "cache eviction" before it exits that simply deletes the oldest modified files until we're below the cache limit). A lot of the generic caching code that gets written for this could eventually be moved into the slave (since it'll all be C++), but I don't see any reason not to start with it in the fetcher for now. How does this sound?
            Hide
            vinodkone Vinod Kone added a comment -

            Are you planning on addressing the issue that the current implementation of fetcher downloads and extract the files in place and then copies them to their final location which is I/O intensive instead of streaming it to the final location?

            AFAICT, fetcher directly downloads remote URIs into the executor sandbox. The only local copy is when the URI is a local file. Am I misunderstanding you?

            Show
            vinodkone Vinod Kone added a comment - Are you planning on addressing the issue that the current implementation of fetcher downloads and extract the files in place and then copies them to their final location which is I/O intensive instead of streaming it to the final location? AFAICT, fetcher directly downloads remote URIs into the executor sandbox. The only local copy is when the URI is a local file. Am I misunderstanding you?
            Hide
            dawa009 Davis Wamola added a comment -

            Are you planning on addressing the issue that the current implementation of fetcher downloads and extract the files in place and then copies them to their final location which is I/O intensive instead of streaming it to the final location?

            If you cache it and only soft-link as you initially suggested then it wouldn’t be a problem, but you said symlinks wouldn’t work for the general case.

            Show
            dawa009 Davis Wamola added a comment - Are you planning on addressing the issue that the current implementation of fetcher downloads and extract the files in place and then copies them to their final location which is I/O intensive instead of streaming it to the final location? If you cache it and only soft-link as you initially suggested then it wouldn’t be a problem, but you said symlinks wouldn’t work for the general case.
            Hide
            tknaup Tobi Knaup added a comment -

            You could leverage the ETag HTTP header. It's usually a hash of the content.

            Show
            tknaup Tobi Knaup added a comment - You could leverage the ETag HTTP header. It's usually a hash of the content.
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            Agreed that we should split the task like that. And the first task can be split up even more. I would prefer to go for a solution without checksum first, get that running, then add checksums in a subsequent patch. There is plenty of refactoring to do to support ANY caching.

            I propose the first patch to have a switch that toggles between the current behavior (download always) and "download only once". (In both cases, package extraction and chmod will be supported.)

            The next patch could introduce checksums and one or two ways of using these.

            The main obstacle for the first cache behavior implementation is that the fetcher is an external program. I understand at least two good reasons for the latter and I do want to continue to support these:
            1. Allow users to replace the fetcher in their installation with a custom solution without recompiling Mesos.
            2. Disentangle Mesos from libraries that are only used for fetching.

            So, I propose to keep the purely download-oriented part of the fetcher in the separate program, pretty much as it is programmed now, but to move everything that is downstream from there (chmod, extract) into the slave OS process. Why? Because at runtime it will have to happen AFTER the bookkeeping that supports the download caching behavior and THAT should be in the slave. Otherwise we need secondary storage to keep track of cache state, and we have explicit writing and reading thereof etc. by the fetcher, plus extra failure modes to program against. Here the significant extra complexity for persisting state can be entirely avoided by a little bit of code refactoring.

            Show
            bernd-mesos Bernd Mathiske added a comment - Agreed that we should split the task like that. And the first task can be split up even more. I would prefer to go for a solution without checksum first, get that running, then add checksums in a subsequent patch. There is plenty of refactoring to do to support ANY caching. I propose the first patch to have a switch that toggles between the current behavior (download always) and "download only once". (In both cases, package extraction and chmod will be supported.) The next patch could introduce checksums and one or two ways of using these. The main obstacle for the first cache behavior implementation is that the fetcher is an external program. I understand at least two good reasons for the latter and I do want to continue to support these: 1. Allow users to replace the fetcher in their installation with a custom solution without recompiling Mesos. 2. Disentangle Mesos from libraries that are only used for fetching. So, I propose to keep the purely download-oriented part of the fetcher in the separate program, pretty much as it is programmed now, but to move everything that is downstream from there (chmod, extract) into the slave OS process. Why? Because at runtime it will have to happen AFTER the bookkeeping that supports the download caching behavior and THAT should be in the slave. Otherwise we need secondary storage to keep track of cache state, and we have explicit writing and reading thereof etc. by the fetcher, plus extra failure modes to program against. Here the significant extra complexity for persisting state can be entirely avoided by a little bit of code refactoring.
            Hide
            vinodkone Vinod Kone added a comment -

            But it is safe to re-download right? Anyway, I agree that using time is not a good idea in general. There are other issues like clock synchronization etc.

            This is what I propose:

            --> First optimize for network downloads
            Introduce md5/hash in CommandInfo as suggested in MESOS-700.

            --> In a subsequent review lets see if we can cleanly optimize for local copies
            Maybe symlinks are the solution here. Or maybe using some kind of read-write container images with low filesystem setup cost (ala docker).

            Show
            vinodkone Vinod Kone added a comment - But it is safe to re-download right? Anyway, I agree that using time is not a good idea in general. There are other issues like clock synchronization etc. This is what I propose: --> First optimize for network downloads Introduce md5/hash in CommandInfo as suggested in MESOS-700 . --> In a subsequent review lets see if we can cleanly optimize for local copies Maybe symlinks are the solution here. Or maybe using some kind of read-write container images with low filesystem setup cost (ala docker).
            Hide
            dawa009 Davis Wamola added a comment -

            The 'Last-modified' date in the HTTP headers is not a good indication of a change. Renaming a file will change the timestamp.

            Show
            dawa009 Davis Wamola added a comment - The 'Last-modified' date in the HTTP headers is not a good indication of a change. Renaming a file will change the timestamp.
            Hide
            vinodkone Vinod Kone added a comment -

            How about using last modified time of the URL to figure out if any changes were made.

            For HTTP could we leverage "If-Unmodified-Since" and/or "Last-Modified" ?
            See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

            Show
            vinodkone Vinod Kone added a comment - How about using last modified time of the URL to figure out if any changes were made. For HTTP could we leverage "If-Unmodified-Since" and/or "Last-Modified" ? See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            All that said, the default behavior should be exactly what it is now and all other variants need to be explicitly asked for with extra flags/arguments. In total, we are looking at the following flag-switchable alternatives:

            • Download each and every time (the default now and then).
            • Download only once, the reuse cache.
            • Framework asks for a specific checksum, download if not found in cache. Note that this variant does NOT probe the URI for a checksum as the next variant does.
            • Probe the checksum at the URI each and every time, download every time the checksum changed.

            Another switch indicates whether tasks get separate copies.

            This is getting complex. Feedback welcome.

            Show
            bernd-mesos Bernd Mathiske added a comment - All that said, the default behavior should be exactly what it is now and all other variants need to be explicitly asked for with extra flags/arguments. In total, we are looking at the following flag-switchable alternatives: Download each and every time (the default now and then). Download only once, the reuse cache. Framework asks for a specific checksum, download if not found in cache. Note that this variant does NOT probe the URI for a checksum as the next variant does. Probe the checksum at the URI each and every time, download every time the checksum changed. Another switch indicates whether tasks get separate copies. This is getting complex. Feedback welcome.
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            I want what is in MESOS-700, too, as an option for the user. But that scheme requires the framework to have a checksum ready and where does this checksum come from? The framework programmer needs to have access to the URI-downloadable resources and generate the checksum ahead of time. In contrast, with the above scheme, this extra work is not necessary. Instead, it is assumed that each URI is (typically) only ever used once and that subsequent downloads would have exactly the same result. This should be a common scenario for most frameworks.

            Since neither variant is ideal for every scenario that may occur in the wild, I suggest to have both, i.e. make the checksum optional as explained in the comments to MESOS-700, and ALSO provide caching if no checksum is given. This makes provisioning easier if no third party download site is involved.

            An important use case for checksums might be upgrading executor software on the fly. You can force just that by introducing a fresh checksum. But employing this kind of trigger is only strictly necessary when the content of a third party URI changes while a framework is running. In all cases where the downloaded resources are under the control of the framework user, the framework has alternative control mechanisms to deal with upgrades, e.g. using different file names.

            Show
            bernd-mesos Bernd Mathiske added a comment - I want what is in MESOS-700 , too, as an option for the user. But that scheme requires the framework to have a checksum ready and where does this checksum come from? The framework programmer needs to have access to the URI-downloadable resources and generate the checksum ahead of time. In contrast, with the above scheme, this extra work is not necessary. Instead, it is assumed that each URI is (typically) only ever used once and that subsequent downloads would have exactly the same result. This should be a common scenario for most frameworks. Since neither variant is ideal for every scenario that may occur in the wild, I suggest to have both, i.e. make the checksum optional as explained in the comments to MESOS-700 , and ALSO provide caching if no checksum is given. This makes provisioning easier if no third party download site is involved. An important use case for checksums might be upgrading executor software on the fly. You can force just that by introducing a fresh checksum. But employing this kind of trigger is only strictly necessary when the content of a third party URI changes while a framework is running. In all cases where the downloaded resources are under the control of the framework user, the framework has alternative control mechanisms to deal with upgrades, e.g. using different file names.
            Hide
            adam-mesos Adam B added a comment -

            What state does the slave/fetcher even need to maintain? Can't we just check the hash of the cached files (perhaps precomputed and stored alongside them) against the hash of the file potentially to-be-downloaded? MESOS-700 has more suggestions on the hashing scheme.

            Show
            adam-mesos Adam B added a comment - What state does the slave/fetcher even need to maintain? Can't we just check the hash of the cached files (perhaps precomputed and stored alongside them) against the hash of the file potentially to-be-downloaded? MESOS-700 has more suggestions on the hashing scheme.
            Hide
            bernd-mesos Bernd Mathiske added a comment - - edited

            Symlinks only work well under the assumption that the fetched content is not mutable or at least mutations are idempotent (e.g. when involving a build process). So, if we are to solve the general case, we must not use symlinks. I'll copy and extract repeatedly in a first implementation. This also takes care of the multi-user problem, with separate chmods for each copy. And the main issue, repeated downloading, remains solved.

            However, after that I can add a framework-selectable switch in the URI protobuf that causes symlinks to avoid copying. I suppose it is OK then if hidden files are not covered by symlinks, since the fallback of copying can still be used for this presumably rare case.

            Show
            bernd-mesos Bernd Mathiske added a comment - - edited Symlinks only work well under the assumption that the fetched content is not mutable or at least mutations are idempotent (e.g. when involving a build process). So, if we are to solve the general case, we must not use symlinks. I'll copy and extract repeatedly in a first implementation. This also takes care of the multi-user problem, with separate chmods for each copy. And the main issue, repeated downloading, remains solved. However, after that I can add a framework-selectable switch in the URI protobuf that causes symlinks to avoid copying. I suppose it is OK then if hidden files are not covered by symlinks, since the fallback of copying can still be used for this presumably rare case.
            Hide
            bernd-mesos Bernd Mathiske added a comment -

            I suggest the following approach. All URI contents gets downloaded into a fetcher result cache directory (short: fetch dir) per slave instead of a work dir per executor. Extraction of archives (e.g. *.tgz files) also happens per slave, inside the fetch dir. The extracted resources are then soft-linked into each executor's work dir.

            How to handle different users and chmod-ing for them? There is a separate fetch subdir for each fetched URI/user combination. In case of an archive, we extract and chmod once per user. If it's not an archive, we make a copy and chmod per user. In any case, we only download once, regardless of user settings.

            The main problem I am facing now is persisting what URIs have been downloaded and resulted in what fetch subdir. This info needs to be kept at least for the duration of the slave process. (No need to go beyond that as in case a slave fails, we can simply wipe the entire fetch cache on recovery.) It would be simpler and foster less fragile source code if the fetcher were part of the slave program, not a separate program. But I reckon we can still keep the required state in the slave's dynamic memory and use it to direct fetcher program invocations. Then we have to be careful to keep what the fetcher does and what the slave knows in sync, though.

            Show
            bernd-mesos Bernd Mathiske added a comment - I suggest the following approach. All URI contents gets downloaded into a fetcher result cache directory (short: fetch dir) per slave instead of a work dir per executor. Extraction of archives (e.g. *.tgz files) also happens per slave, inside the fetch dir. The extracted resources are then soft-linked into each executor's work dir. How to handle different users and chmod-ing for them? There is a separate fetch subdir for each fetched URI/user combination. In case of an archive, we extract and chmod once per user. If it's not an archive, we make a copy and chmod per user. In any case, we only download once, regardless of user settings. The main problem I am facing now is persisting what URIs have been downloaded and resulted in what fetch subdir. This info needs to be kept at least for the duration of the slave process. (No need to go beyond that as in case a slave fails, we can simply wipe the entire fetch cache on recovery.) It would be simpler and foster less fragile source code if the fetcher were part of the slave program, not a separate program. But I reckon we can still keep the required state in the slave's dynamic memory and use it to direct fetcher program invocations. Then we have to be careful to keep what the fetcher does and what the slave knows in sync, though.
            Hide
            samtaha Sam Taha added a comment -

            Both these issues relate to how to efficiently manage resource caching between executors and executor runs on a slave/framework.

            Show
            samtaha Sam Taha added a comment - Both these issues relate to how to efficiently manage resource caching between executors and executor runs on a slave/framework.

              People

              • Assignee:
                bernd-mesos Bernd Mathiske
                Reporter:
                wickman brian wickman
                Shepherd:
                Benjamin Hindman
              • Votes:
                3 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 672h
                  672h
                  Remaining:
                  Remaining Estimate - 672h
                  672h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified

                    Development