Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.1
    • Component/s: mrv2
    • Labels:
      None

      Description

      If all current {{Fetcher}}s complete while an in-memory merge is in progress - shuffle could hang.
      Specifically - if the memory freed by an in-memory merge does not bring MergeManager.usedMemory below MergeManager.memoryLimit and all current Fetchers complete before the in-memory merge completes, another in-memory merge will not be triggered - and shuffle will hang. (All new fetchers are asked to WAIT).

      1. MR-3721.txt
        4 kB
        Siddharth Seth

        Issue Links

          Activity

          Siddharth Seth created issue -
          Hide
          Siddharth Seth added a comment -

          Initially reported by Karam Singh

          Show
          Siddharth Seth added a comment - Initially reported by Karam Singh
          Arun C Murthy made changes -
          Field Original Value New Value
          Link This issue blocks MAPREDUCE-3719 [ MAPREDUCE-3719 ]
          Hide
          Siddharth Seth added a comment -

          Patch adds another variable - commitMemory (Fetch complete size). A merge is triggered only if this size exceeds mergeThreshold. Added a check to ensure mergeThreshold is greater than the maxSingleShuffleLimit.
          Earlier - usedMemory (reserved) was used for this computation - which meant a single segment way below mergeThreshold could lead to a merge to disk.

          Have run several gridmix runs with the patch applied - without a hang. Not including a unit test - writing one would likely change way more in the shuffle code to be able to recreate the scenario.

          Show
          Siddharth Seth added a comment - Patch adds another variable - commitMemory (Fetch complete size). A merge is triggered only if this size exceeds mergeThreshold. Added a check to ensure mergeThreshold is greater than the maxSingleShuffleLimit. Earlier - usedMemory (reserved) was used for this computation - which meant a single segment way below mergeThreshold could lead to a merge to disk. Have run several gridmix runs with the patch applied - without a hang. Not including a unit test - writing one would likely change way more in the shuffle code to be able to recreate the scenario.
          Siddharth Seth made changes -
          Attachment MR-3721.txt [ 12511898 ]
          Siddharth Seth made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Target Version/s 0.23.1 [ 12318883 ]
          Hide
          Arun C Murthy added a comment -

          +1 lgtm, good catch!

          Show
          Arun C Murthy added a comment - +1 lgtm, good catch!
          Hide
          Arun C Murthy added a comment -

          I just committed this. Thanks Sid!

          Show
          Arun C Murthy added a comment - I just committed this. Thanks Sid!
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.23.1 [ 12318883 ]
          Resolution Fixed [ 1 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #1660 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1660/)
          MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #1660 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1660/ ) MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #1587 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1587/)
          MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #1587 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1587/ ) MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Commit #412 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/412/)
          Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Commit #412 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/412/ ) Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #1604 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1604/)
          MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #1604 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1604/ ) MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-0.23-Commit #421 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/421/)
          Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-0.23-Commit #421 (See https://builds.apache.org/job/Hadoop-Common-0.23-Commit/421/ ) Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Commit #437 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/437/)
          Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Commit #437 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/437/ ) Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #937 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/937/)
          MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #937 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/937/ ) MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-0.23-Build #150 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/150/)
          Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-0.23-Build #150 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/150/ ) Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-0.23-Build #172 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/172/)
          Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042
          Files :

          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-0.23-Build #172 (See https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/172/ ) Merge -c 1236041 from trunk to branch-0.23 to fix MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236042 Files : /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #970 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/970/)
          MAPREDUCE-3721. Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth.

          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #970 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/970/ ) MAPREDUCE-3721 . Fixed a race in shuffle which caused reduces to hang. Contributed by Siddharth Seth. acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1236041 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MergeManager.java
          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Jason Lowe made changes -
          Link This issue is related to MAPREDUCE-4842 [ MAPREDUCE-4842 ]

            People

            • Assignee:
              Siddharth Seth
              Reporter:
              Siddharth Seth
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development