Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2541

Race Condition in IndexCache(readIndexFileToCache,removeMap) causes value of totalMemoryUsed corrupt, which may cause TaskTracker continue throw Exception

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.20.1, 0.21.0, 0.22.0, 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: tasktracker
    • Labels:
      None
    • Environment:

      all

      Description

      The race condition goes like this:
      Thread1: readIndexFileToCache() totalMemoryUsed.addAndGet(newInd.getSize())
      Thread2: removeMap() totalMemoryUsed.addAndGet(-info.getSize());
      When SpillRecord is being read from fileSystem, client kills the job, info.getSize() equals 0, so in fact totalMemoryUsed is not reduced, but after thread1 finished reading SpillRecord, it adds the real index size to totalMemoryUsed, which makes the value of totalMemoryUsed wrong(larger).
      When this value(totalMemoryUsed) exceeds totalMemoryAllowed (this usually happens when a vary large job with vary large reduce number is killed by the user, probably because the user sets a wrong reduce number by mistake), and actually indexCache has not cache anything, freeIndexInformation() will throw exception constantly.

      A quick fix for this issue is to make removeMap() do nothing, let freeIndexInformation() do this job only.

      1. MAPREDUCE-2541.v2.patch
        4 kB
        Binglin Chang
      2. MAPREDUCE-2541.patch
        0.7 kB
        Binglin Chang

        Activity

        Binglin Chang created issue -
        Binglin Chang made changes -
        Field Original Value New Value
        Original Estimate 2h [ 7200 ]
        Remaining Estimate 2h [ 7200 ]
        Hide
        Binglin Chang added a comment -

        patch to 0.21 branch. Make IndexCache.removeMap() do nothing

        Show
        Binglin Chang added a comment - patch to 0.21 branch. Make IndexCache.removeMap() do nothing
        Binglin Chang made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Binglin Chang made changes -
        Attachment MAPREDUCE-2541.patch [ 12480750 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12480750/MAPREDUCE-2541.patch
        against trunk revision 1128394.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestMRCLI
        org.apache.hadoop.tools.TestHadoopArchives
        org.apache.hadoop.tools.TestHarFileSystem

        -1 contrib tests. The patch failed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/319//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/319//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/319//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12480750/MAPREDUCE-2541.patch against trunk revision 1128394. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.tools.TestHadoopArchives org.apache.hadoop.tools.TestHarFileSystem -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/319//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/319//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/319//console This message is automatically generated.
        Hide
        Binglin Chang added a comment -

        It seems that there is something wrong with current trunk, recent PreCommit builds from #303~#320 all failed.
        https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/

        Show
        Binglin Chang added a comment - It seems that there is something wrong with current trunk, recent PreCommit builds from #303~#320 all failed. https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/
        Hide
        Binglin Chang added a comment -

        Changes:
        1. removeMap()removes the map from the cache if index information for this map is loaded(size>0), index information entry in cache will not be removed if it is in the loading phrase(size=0), this prevents corruption of totalMemoryUsed
        2. add checkTotalMemoryUsed() in IndexCache to check consistency, this is only used in unit test.
        3. add a unit test to construct the race condition, the test failed against current trunk code, and patched version passed the case on my computer.

        The failed test(TestMRCLI) posted by HadoopQA was not caused by this patch.

        Show
        Binglin Chang added a comment - Changes: 1. removeMap()removes the map from the cache if index information for this map is loaded(size>0), index information entry in cache will not be removed if it is in the loading phrase(size=0), this prevents corruption of totalMemoryUsed 2. add checkTotalMemoryUsed() in IndexCache to check consistency, this is only used in unit test. 3. add a unit test to construct the race condition, the test failed against current trunk code, and patched version passed the case on my computer. The failed test(TestMRCLI) posted by HadoopQA was not caused by this patch.
        Binglin Chang made changes -
        Attachment MAPREDUCE-2541.v2.patch [ 12481461 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12481461/MAPREDUCE-2541.v2.patch
        against trunk revision 1131265.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these core unit tests:
        org.apache.hadoop.cli.TestMRCLI

        +1 contrib tests. The patch passed contrib unit tests.

        +1 system test framework. The patch passed system test framework compile.

        Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/349//testReport/
        Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/349//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/349//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12481461/MAPREDUCE-2541.v2.patch against trunk revision 1131265. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/349//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/349//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/349//console This message is automatically generated.
        Arun C Murthy made changes -
        Assignee Binglin Chang [ decster ]
        Hide
        Arun C Murthy added a comment -

        I just committed this. Thanks Binglin!

        Show
        Arun C Murthy added a comment - I just committed this. Thanks Binglin!
        Arun C Murthy made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 0.23.0 [ 12315570 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #766 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/766/)
        MAPREDUCE-2541. Fixed a race condition in IndexCache.removeMap. Contributed by Binglin Chang.

        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1157346
        Files :

        • /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestIndexCache.java
        • /hadoop/common/trunk/mapreduce/CHANGES.txt
        • /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/IndexCache.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #766 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/766/ ) MAPREDUCE-2541 . Fixed a race condition in IndexCache.removeMap. Contributed by Binglin Chang. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1157346 Files : /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestIndexCache.java /hadoop/common/trunk/mapreduce/CHANGES.txt /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/IndexCache.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #754 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/)
        MAPREDUCE-2541. Fixed a race condition in IndexCache.removeMap. Contributed by Binglin Chang.

        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1157346
        Files :

        • /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestIndexCache.java
        • /hadoop/common/trunk/mapreduce/CHANGES.txt
        • /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/IndexCache.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #754 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/754/ ) MAPREDUCE-2541 . Fixed a race condition in IndexCache.removeMap. Contributed by Binglin Chang. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1157346 Files : /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestIndexCache.java /hadoop/common/trunk/mapreduce/CHANGES.txt /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/IndexCache.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #742 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/742/)
        MAPREDUCE-2541. Fixed a race condition in IndexCache.removeMap. Contributed by Binglin Chang.

        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1157346
        Files :

        • /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestIndexCache.java
        • /hadoop/common/trunk/mapreduce/CHANGES.txt
        • /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/IndexCache.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #742 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/742/ ) MAPREDUCE-2541 . Fixed a race condition in IndexCache.removeMap. Contributed by Binglin Chang. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1157346 Files : /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestIndexCache.java /hadoop/common/trunk/mapreduce/CHANGES.txt /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/IndexCache.java
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Binglin Chang
            Reporter:
            Binglin Chang
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development