Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.9.0, 3.0.0-alpha1
    • Component/s: None
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out / index information) when it receives a message.
      This should be caching map info across requests, so that the a scan of all directories is not required for each reducer fetching from the same map.

      Also, the scan for each map output / index file is performed twice per mapId within a request. In populateHeaders - once in the call to getMapOutputInfo, and then directly in the method.

      For an invocation where we do end up with more than 1000 (default) mapIds in a single call, and don't cache them in the map - the path constructed for such entries will be invalid. This is highly unlikely to be the case though, until there's proper caching.

      MapOutputInfo info = mapOutputInfoMap.get(mapId);
                if (info == null) {
                  info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user);
                }
      

        Activity

        Hide
        djp Junping Du added a comment -

        The similar issue get fixed for Hive in HIVE-9912. Attache a patch refer most of the original patch but remove the dirWatcher.

        Show
        djp Junping Du added a comment - The similar issue get fixed for Hive in HIVE-9912 . Attache a patch refer most of the original patch but remove the dirWatcher.
        Hide
        hadoopqa Hadoop QA added a comment -
        -1 overall



        Vote Subsystem Runtime Comment
        0 reexec 0m 12s Docker mode activated.
        +1 @author 0m 0s The patch does not contain any @author tags.
        +1 test4tests 0m 0s The patch appears to include 1 new or modified test files.
        +1 mvninstall 6m 44s trunk passed
        +1 compile 0m 15s trunk passed
        +1 checkstyle 0m 12s trunk passed
        +1 mvnsite 0m 18s trunk passed
        +1 mvneclipse 0m 13s trunk passed
        +1 findbugs 0m 23s trunk passed
        +1 javadoc 0m 12s trunk passed
        +1 mvninstall 0m 12s the patch passed
        +1 compile 0m 12s the patch passed
        +1 javac 0m 12s the patch passed
        -1 checkstyle 0m 10s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle: The patch generated 8 new + 90 unchanged - 1 fixed = 98 total (was 91)
        +1 mvnsite 0m 15s the patch passed
        +1 mvneclipse 0m 12s the patch passed
        -1 whitespace 0m 0s The patch has 21 line(s) that end in whitespace. Use git apply --whitespace=fix.
        +1 findbugs 0m 26s the patch passed
        +1 javadoc 0m 10s the patch passed
        +1 unit 0m 16s hadoop-mapreduce-client-shuffle in the patch passed.
        +1 asflicense 0m 16s The patch does not generate ASF License warnings.
        11m 17s



        Subsystem Report/Notes
        Docker Image:yetus/hadoop:2c91fd8
        JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12810092/MAPREDUCE-6197.patch
        JIRA Issue MAPREDUCE-6197
        Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle
        uname Linux a46ee62cc804 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
        Build tool maven
        Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh
        git revision trunk / 709a814
        Default Java 1.8.0_91
        findbugs v3.0.0
        checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6552/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-shuffle.txt
        whitespace https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6552/artifact/patchprocess/whitespace-eol.txt
        Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6552/testReport/
        modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
        Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6552/console
        Powered by Apache Yetus 0.3.0 http://yetus.apache.org

        This message was automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall Vote Subsystem Runtime Comment 0 reexec 0m 12s Docker mode activated. +1 @author 0m 0s The patch does not contain any @author tags. +1 test4tests 0m 0s The patch appears to include 1 new or modified test files. +1 mvninstall 6m 44s trunk passed +1 compile 0m 15s trunk passed +1 checkstyle 0m 12s trunk passed +1 mvnsite 0m 18s trunk passed +1 mvneclipse 0m 13s trunk passed +1 findbugs 0m 23s trunk passed +1 javadoc 0m 12s trunk passed +1 mvninstall 0m 12s the patch passed +1 compile 0m 12s the patch passed +1 javac 0m 12s the patch passed -1 checkstyle 0m 10s hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle: The patch generated 8 new + 90 unchanged - 1 fixed = 98 total (was 91) +1 mvnsite 0m 15s the patch passed +1 mvneclipse 0m 12s the patch passed -1 whitespace 0m 0s The patch has 21 line(s) that end in whitespace. Use git apply --whitespace=fix. +1 findbugs 0m 26s the patch passed +1 javadoc 0m 10s the patch passed +1 unit 0m 16s hadoop-mapreduce-client-shuffle in the patch passed. +1 asflicense 0m 16s The patch does not generate ASF License warnings. 11m 17s Subsystem Report/Notes Docker Image:yetus/hadoop:2c91fd8 JIRA Patch URL https://issues.apache.org/jira/secure/attachment/12810092/MAPREDUCE-6197.patch JIRA Issue MAPREDUCE-6197 Optional Tests asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle uname Linux a46ee62cc804 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Build tool maven Personality /testptch/hadoop/patchprocess/precommit/personality/provided.sh git revision trunk / 709a814 Default Java 1.8.0_91 findbugs v3.0.0 checkstyle https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6552/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-shuffle.txt whitespace https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6552/artifact/patchprocess/whitespace-eol.txt Test Results https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6552/testReport/ modules C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle Console output https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6552/console Powered by Apache Yetus 0.3.0 http://yetus.apache.org This message was automatically generated.
        Hide
        jianhe Jian He added a comment -

        lgtm,
        one question is how/why do you choose such policy for determining the weight ?

        maximumWeight(MAX_WEIGHT).weigher(
                  new Weigher<AttemptPathIdentifier, AttemptPathInfo>() {
                    @Override
                    public int weigh(AttemptPathIdentifier key,
                        AttemptPathInfo value) {
                      return key.jobId.length() + key.user.length() +
                          key.attemptId.length()+
                          value.indexPath.toString().length() +
                          value.dataPath.toString().length();
                    }
                  }
              )
        
        Show
        jianhe Jian He added a comment - lgtm, one question is how/why do you choose such policy for determining the weight ? maximumWeight(MAX_WEIGHT).weigher( new Weigher<AttemptPathIdentifier, AttemptPathInfo>() { @Override public int weigh(AttemptPathIdentifier key, AttemptPathInfo value) { return key.jobId.length() + key.user.length() + key.attemptId.length()+ value.indexPath.toString().length() + value.dataPath.toString().length(); } } )
        Hide
        djp Junping Du added a comment -

        Thanks Jian He for review and comments.

        one question is how/why do you choose such policy for determining the weight?

        That's good question. To control cache size of a LoadingCache, we can either to use maximumSize directly or maximumWeight. The reason to choose maximumWeight instead of maximumSize is each cache item here is a flexible size which depends on key size + value size. It means if we use a fixed maximumSize, we still not sure how much memory it could end up with. The another reason is to keep consistent with what we have in HIVE-9912. If we found any issue with current settings/policy in large production deployment in future, we can change both side together.

        Show
        djp Junping Du added a comment - Thanks Jian He for review and comments. one question is how/why do you choose such policy for determining the weight? That's good question. To control cache size of a LoadingCache, we can either to use maximumSize directly or maximumWeight. The reason to choose maximumWeight instead of maximumSize is each cache item here is a flexible size which depends on key size + value size . It means if we use a fixed maximumSize, we still not sure how much memory it could end up with. The another reason is to keep consistent with what we have in HIVE-9912 . If we found any issue with current settings/policy in large production deployment in future, we can change both side together.
        Hide
        jianhe Jian He added a comment -

        Committed to trunk, branch-2, thanks Junping !

        Show
        jianhe Jian He added a comment - Committed to trunk, branch-2, thanks Junping !
        Hide
        djp Junping Du added a comment -

        Thanks Jian He for review and commit!

        Show
        djp Junping Du added a comment - Thanks Jian He for review and commit!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-trunk-Commit #9997 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9997/)
        MAPREDUCE-6197. Cache MapOutputLocations in ShuffleHandler. Contributed (jianhe: rev d8107fcd1c93c202925f2946d0cd4072fe0aef1e)

        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
        • hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-trunk-Commit #9997 (See https://builds.apache.org/job/Hadoop-trunk-Commit/9997/ ) MAPREDUCE-6197 . Cache MapOutputLocations in ShuffleHandler. Contributed (jianhe: rev d8107fcd1c93c202925f2946d0cd4072fe0aef1e) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java

          People

          • Assignee:
            djp Junping Du
            Reporter:
            sseth Siddharth Seth
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development