Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.2
    • Component/s: distributed-cache
    • Labels:
      None
    • Target Version/s:

      Description

      Distributed caches are not being properly removed by the TaskTracker when they are expected to be expired.

      1. MAPREDUCE-3824-branch-1.0.patch
        15 kB
        Thomas Graves
      2. MAPREDUCE-3824-branch-1.0.patch
        14 kB
        Thomas Graves
      3. MAPREDUCE-3824-branch-1.0.txt
        4 kB
        Allen Wittenauer

        Activity

        Allen Wittenauer created issue -
        Hide
        Allen Wittenauer added a comment -

        This appears to be related to two problems:

        • race conditions around refcount
        • private caches are never updated to reflect their true size
        Show
        Allen Wittenauer added a comment - This appears to be related to two problems: race conditions around refcount private caches are never updated to reflect their true size
        Hide
        Allen Wittenauer added a comment -

        Here's a quick hack that fixes both issues on my systems.

        Show
        Allen Wittenauer added a comment - Here's a quick hack that fixes both issues on my systems.
        Allen Wittenauer made changes -
        Field Original Value New Value
        Attachment MAPREDUCE-3824-branch-1.0.txt [ 12513518 ]
        Hide
        Robert Joseph Evans added a comment -

        I like the concept of the patch. Volatile is definitely needed here, my bad on that one. I also like that you are doing a DU to update the size of the cached objects if they are 0. I do have some issues with the patch though.

        The first is that even though the DU size update is being done on a separate thread it is being done with the cachedArchives lock held. The amount of time it takes to do a DU could be significant. Nothing new can be added to the cache while the cachedArchives lock is held, so it could be blocking other new tasks from making progress. I would really prefer to see this done in two passes, similar to how we delete out entries. The first pass would go through all entries and identify those that need to be updated, the second pass would be to update those entries without the lock held. Then once we have all of the entries updated we can look at cleaning up the distributed cache.

        The second is that we are updating the size too late. We decide how much space needs to be deleted to get us back under the desired amount based totally on the size reported by BaseDirManager, which in turn gets its data from the CacheStatus object. The issue is that in the current patch we first calculate how much needs to be removed, then we update the size of the archives, then we delete them. This is not that critical, because it just means that in the next pass they would be deleted, so this is really very minor, but should be covered by doing the update in two passes.

        I am not sure exactly what are the situations that the size is not being set. I would like to know exactly which situations the current code is missing, because like I said previously the code that computes the used size goes completely off of what is reported to BaseDirManager, unfortunately there are some issues with BaseDirManger where if we are too aggressive with setting the size we might double count some archives, which eventually would make it so that the BaseDirManager thinks it is full all the time, which would be very bad.

        Show
        Robert Joseph Evans added a comment - I like the concept of the patch. Volatile is definitely needed here, my bad on that one. I also like that you are doing a DU to update the size of the cached objects if they are 0. I do have some issues with the patch though. The first is that even though the DU size update is being done on a separate thread it is being done with the cachedArchives lock held. The amount of time it takes to do a DU could be significant. Nothing new can be added to the cache while the cachedArchives lock is held, so it could be blocking other new tasks from making progress. I would really prefer to see this done in two passes, similar to how we delete out entries. The first pass would go through all entries and identify those that need to be updated, the second pass would be to update those entries without the lock held. Then once we have all of the entries updated we can look at cleaning up the distributed cache. The second is that we are updating the size too late. We decide how much space needs to be deleted to get us back under the desired amount based totally on the size reported by BaseDirManager, which in turn gets its data from the CacheStatus object. The issue is that in the current patch we first calculate how much needs to be removed, then we update the size of the archives, then we delete them. This is not that critical, because it just means that in the next pass they would be deleted, so this is really very minor, but should be covered by doing the update in two passes. I am not sure exactly what are the situations that the size is not being set. I would like to know exactly which situations the current code is missing, because like I said previously the code that computes the used size goes completely off of what is reported to BaseDirManager, unfortunately there are some issues with BaseDirManger where if we are too aggressive with setting the size we might double count some archives, which eventually would make it so that the BaseDirManager thinks it is full all the time, which would be very bad.
        Hide
        Allen Wittenauer added a comment -

        There is no doubt the patch is a hack, but it solved my immediate problems because as it stands, distributed caches are really broken at scale.

        Some background. I have a team of users that have several 36GB distributed caches. When these caches are in play, most of the system is basically locked while these caches get built. This patch was really geared towards making sure that these massive caches at least get deleted. Without these patches in place, the mapred tmp spaces fill and tasks fail, eventually leading to mapred framework collapse.

        There are a lot of other problems that show up with caches this large:

        • Hadoop doesn't have a size limit check on caches as part of the job submission process [So any hand waving about "don't use caches that big!" are null and void since there is no way to actually stop a user from doing that!]
        • the setup and cleanup tasks also trigger cache downloads.
        • tasktrackers appear to be frozen for all tasks during cache downloads, with the task stuck in the extremely unhelpful "unassigned" state.
        • the methodology of updating the private cache as a different step seems unnecessary given the permissions at the file system level.

        What really needs to happen is a massive overhaul of the entire distributed cache system. But that's a bigger project, preferably for someone who gets paid to do hadoop development full time. So, like all of the patches I've been submitting lately, I'm not expecting them to get committed. But this is enough of a patch for someone who needs a useable system until a working release ships.

        Show
        Allen Wittenauer added a comment - There is no doubt the patch is a hack, but it solved my immediate problems because as it stands, distributed caches are really broken at scale. Some background. I have a team of users that have several 36GB distributed caches. When these caches are in play, most of the system is basically locked while these caches get built. This patch was really geared towards making sure that these massive caches at least get deleted. Without these patches in place, the mapred tmp spaces fill and tasks fail, eventually leading to mapred framework collapse. There are a lot of other problems that show up with caches this large: Hadoop doesn't have a size limit check on caches as part of the job submission process [So any hand waving about "don't use caches that big!" are null and void since there is no way to actually stop a user from doing that!] the setup and cleanup tasks also trigger cache downloads. tasktrackers appear to be frozen for all tasks during cache downloads, with the task stuck in the extremely unhelpful "unassigned" state. the methodology of updating the private cache as a different step seems unnecessary given the permissions at the file system level. What really needs to happen is a massive overhaul of the entire distributed cache system. But that's a bigger project, preferably for someone who gets paid to do hadoop development full time. So, like all of the patches I've been submitting lately, I'm not expecting them to get committed. But this is enough of a patch for someone who needs a useable system until a working release ships.
        Hide
        Allen Wittenauer added a comment -

        PS, if the tasktracker can survive disk failure in 1.0, it must not work under these conditions. So many failures...

        Show
        Allen Wittenauer added a comment - PS, if the tasktracker can survive disk failure in 1.0, it must not work under these conditions. So many failures...
        Hide
        Allen Wittenauer added a comment -

        (well the tasks that the tasks launch, not the TT itself)

        Show
        Allen Wittenauer added a comment - (well the tasks that the tasks launch, not the TT itself)
        Hide
        Robert Joseph Evans added a comment -

        I agree with you the the 1.0 distributed cache code is a mess. I don't really understand all of what it is doing, especially with how it interacts with other parts of TT. I agree a rewrite would be good. And I think that is what YARN is for. I would be happy to commit your patch for you. At a minimum the volatile change needs to go in, and the rest I just want to feel comfortable that it will not cause more harm then good. Also if you want to place limits on the size of files placed in the distributed cache, please file a JIRA for that, it would be a great addition.

        Show
        Robert Joseph Evans added a comment - I agree with you the the 1.0 distributed cache code is a mess. I don't really understand all of what it is doing, especially with how it interacts with other parts of TT. I agree a rewrite would be good. And I think that is what YARN is for. I would be happy to commit your patch for you. At a minimum the volatile change needs to go in, and the rest I just want to feel comfortable that it will not cause more harm then good. Also if you want to place limits on the size of files placed in the distributed cache, please file a JIRA for that, it would be a great addition.
        Hide
        Allen Wittenauer added a comment -

        Part of the reason why distribute cache is a mystery as to how it actually works in practice is due to the lack of active debugging. Note that most of this patch is LOG.debug() statements.

        Show
        Allen Wittenauer added a comment - Part of the reason why distribute cache is a mystery as to how it actually works in practice is due to the lack of active debugging. Note that most of this patch is LOG.debug() statements.
        Hide
        Thomas Graves added a comment -

        After some debugging, it appears that the size isn't being calculated properly (set to 0) if the user specifies a directory to go into the distributed cache. It only appears to happen if its a private cached directory. I'm working on a patch.

        Allen, can you confirm that your users were specifying a directory to be cached and not a file?

        Show
        Thomas Graves added a comment - After some debugging, it appears that the size isn't being calculated properly (set to 0) if the user specifies a directory to go into the distributed cache. It only appears to happen if its a private cached directory. I'm working on a patch. Allen, can you confirm that your users were specifying a directory to be cached and not a file?
        Thomas Graves made changes -
        Assignee Thomas Graves [ tgraves ]
        Hide
        Allen Wittenauer added a comment -

        Yes, that corresponds to what I was seeing as well. Sorry, forgot to mention the directory thing. I've been running the patch for so long and so successfully I forgot about that detail.

        Show
        Allen Wittenauer added a comment - Yes, that corresponds to what I was seeing as well. Sorry, forgot to mention the directory thing. I've been running the patch for so long and so successfully I forgot about that detail.
        Hide
        Thomas Graves added a comment -

        patch changes:

        • refcount to be Atomic Integer
        • private caches properly update sizes
        • add test for sizes
        Show
        Thomas Graves added a comment - patch changes: refcount to be Atomic Integer private caches properly update sizes add test for sizes
        Thomas Graves made changes -
        Attachment MAPREDUCE-3824-branch-1.0.patch [ 12514727 ]
        Thomas Graves made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Target Version/s 1.0.1 [ 12319503 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12514727/MAPREDUCE-3824-branch-1.0.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1871//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514727/MAPREDUCE-3824-branch-1.0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1871//console This message is automatically generated.
        Hide
        Thomas Graves added a comment -

        I did a bunch of manual testing. I tested private and public version of files and directories and also private and public archives going into the distributed cache and verified that the sizes were set properly. If there are any other cases I missed please let me know.

        Show
        Thomas Graves added a comment - I did a bunch of manual testing. I tested private and public version of files and directories and also private and public archives going into the distributed cache and verified that the sizes were set properly. If there are any other cases I missed please let me know.
        Hide
        Allen Wittenauer added a comment -

        It would be good to add some debug statements. When stuff does break (related or not), ops teams end up being completely blind. (This is a problem with many many many other parts of Hadoop!)

        Show
        Allen Wittenauer added a comment - It would be good to add some debug statements. When stuff does break (related or not), ops teams end up being completely blind. (This is a problem with many many many other parts of Hadoop!)
        Hide
        Thomas Graves added a comment -

        added some debug statements.

        Show
        Thomas Graves added a comment - added some debug statements.
        Thomas Graves made changes -
        Attachment MAPREDUCE-3824-branch-1.0.patch [ 12514754 ]
        Hide
        Thomas Graves added a comment -

        test-patch results. The findbugs warnings aren't from this patch. They occurred without this patch also.

        [exec]
        [exec]
        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 3 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings.

        Show
        Thomas Graves added a comment - test-patch results. The findbugs warnings aren't from this patch. They occurred without this patch also. [exec] [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12514754/MAPREDUCE-3824-branch-1.0.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1878//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514754/MAPREDUCE-3824-branch-1.0.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1878//console This message is automatically generated.
        Hide
        Robert Joseph Evans added a comment -

        I like the patch. +1

        I would prefer to see the debug log for "Error cleaning up cache" to be an actual error log. But that is minor as it is debug now, and the patch just changes the information logged.

        Show
        Robert Joseph Evans added a comment - I like the patch. +1 I would prefer to see the debug log for "Error cleaning up cache" to be an actual error log. But that is minor as it is debug now, and the patch just changes the information logged.
        Hide
        Matt Foley added a comment -

        Committed to branch-1.0 and branch-1.
        Thanks, Thomas, Allen, and reviewers!

        Show
        Matt Foley added a comment - Committed to branch-1.0 and branch-1. Thanks, Thomas, Allen, and reviewers!
        Matt Foley made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 1.0.1 [ 12319503 ]
        Resolution Fixed [ 1 ]
        Hide
        Matt Foley added a comment -

        Was committed to 1.0.2, not 1.0.1.

        Show
        Matt Foley added a comment - Was committed to 1.0.2, not 1.0.1.
        Matt Foley made changes -
        Fix Version/s 1.0.2 [ 12320047 ]
        Fix Version/s 1.0.1 [ 12319503 ]
        Target Version/s 1.0.1 [ 12319503 ] 1.0.2 [ 12320047 ]
        Matt Foley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Thomas Graves
            Reporter:
            Allen Wittenauer
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development