Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3289

Make use of fadvise in the NM's shuffle handler

    Details

      Description

      Using the new NativeIO fadvise functions, we can make the NodeManager prefetch map output before it's send over the socket, and drop it out of the fs cache once it's been sent (since it's very rare for an output to have to be re-sent). This improves IO efficiency and reduces cache pollution.

      1. mr-3289.txt
        5 kB
        Todd Lipcon
      2. 3289-1.txt
        3 kB
        Todd Lipcon
      3. 3289-2.txt
        4 kB
        Todd Lipcon
      4. MAPREDUCE-3289.branch-1.patch
        4 kB
        Brandon Li
      5. MR3289_trunk.txt
        9 kB
        Siddharth Seth
      6. MR3289_trunk_2.txt
        13 kB
        Siddharth Seth
      7. MR3289_trunk_3.txt
        13 kB
        Siddharth Seth
      8. MAPREDUCE-3289.branch-1.patch
        4 kB
        Brandon Li

        Issue Links

          Activity

          Hide
          Arun C Murthy added a comment -

          I just merged it to branch-1.1 too.

          Show
          Arun C Murthy added a comment - I just merged it to branch-1.1 too.
          Hide
          Arun C Murthy added a comment -

          Matt - if you don't mind, I'd like to merge this into branch-1.1 since it's been well baked-in. Thoughts?

          Show
          Arun C Murthy added a comment - Matt - if you don't mind, I'd like to merge this into branch-1.1 since it's been well baked-in. Thoughts?
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1156 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1156/)
          MAPREDUCE-3289. Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718)

          Result = FAILURE
          sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1156 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1156/ ) MAPREDUCE-3289 . Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1124 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1124/)
          MAPREDUCE-3289. Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718)

          Result = SUCCESS
          sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1124 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1124/ ) MAPREDUCE-3289 . Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk-Commit #2617 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2617/)
          MAPREDUCE-3289. Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718)

          Result = SUCCESS
          sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2617 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2617/ ) MAPREDUCE-3289 . Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #2552 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2552/)
          MAPREDUCE-3289. Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718)

          Result = SUCCESS
          sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2552 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2552/ ) MAPREDUCE-3289 . Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #2570 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2570/)
          MAPREDUCE-3289. Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718)

          Result = FAILURE
          sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718
          Files :

          • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java
          • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2570 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2570/ ) MAPREDUCE-3289 . Make use of fadvise in the NM's shuffle handler. (Contributed by Todd Lipcon and Siddharth Seth) (Revision 1368718) Result = FAILURE sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1368718 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedChunkedFile.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
          Hide
          Siddharth Seth added a comment -

          Committed to trunk, branch-2 and branch-1. Thanks Todd and Brandon!

          Show
          Siddharth Seth added a comment - Committed to trunk, branch-2 and branch-1. Thanks Todd and Brandon!
          Hide
          Arun C Murthy added a comment -

          +1 for the trunk patch, looks good - thanks Todd & Sid.

          And +1 for Sid's comment that we should keep trunk and branch-1 in sync.

          Show
          Arun C Murthy added a comment - +1 for the trunk patch, looks good - thanks Todd & Sid. And +1 for Sid's comment that we should keep trunk and branch-1 in sync.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12538856/MAPREDUCE-3289.branch-1.patch
          against trunk revision .

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2698//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538856/MAPREDUCE-3289.branch-1.patch against trunk revision . -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2698//console This message is automatically generated.
          Hide
          Brandon Li added a comment -

          Updated the branch-1 patch to use the same parameter names as trunk patch.

          Show
          Brandon Li added a comment - Updated the branch-1 patch to use the same parameter names as trunk patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12538853/MR3289_trunk_3.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2697//testReport/
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2697//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538853/MR3289_trunk_3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2697//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2697//console This message is automatically generated.
          Hide
          Siddharth Seth added a comment -

          Fixes the findbugs warnings.

          I've left the configs in ShuffleHandler itself. MRConfig doesn't seem like the right place - since Shuffle is an Aux Service for yarn, which doesn't necessarily need MR classes to run.

          Regarding the parameter names - I think it's better to use the same in both patches, as against introducing a parameter in branch-1 and deprecating it immediately in the the branch-2 patch.
          Brandon, could you please make this change in the branch-1 patch you had posted.

          Show
          Siddharth Seth added a comment - Fixes the findbugs warnings. I've left the configs in ShuffleHandler itself. MRConfig doesn't seem like the right place - since Shuffle is an Aux Service for yarn, which doesn't necessarily need MR classes to run. Regarding the parameter names - I think it's better to use the same in both patches, as against introducing a parameter in branch-1 and deprecating it immediately in the the branch-2 patch. Brandon, could you please make this change in the branch-1 patch you had posted.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12538835/MR3289_trunk_2.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2695//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2695//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-shuffle.html
          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2695//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538835/MR3289_trunk_2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2695//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2695//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-shuffle.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2695//console This message is automatically generated.
          Hide
          Siddharth Seth added a comment -

          Updated patch fro trunk with some fixes, and fadvise when using ssl.

          Show
          Siddharth Seth added a comment - Updated patch fro trunk with some fixes, and fadvise when using ssl.
          Hide
          Siddharth Seth added a comment -

          Patch for trunk. Needs some more work for the recent ssl changes. Also needs some testing.
          I'm not sure if this is the best way to do this with Netty. Would like some feedback.

          Show
          Siddharth Seth added a comment - Patch for trunk. Needs some more work for the recent ssl changes. Also needs some testing. I'm not sure if this is the best way to do this with Netty. Would like some feedback.
          Hide
          Siddharth Seth added a comment -

          The branch-1 patch looks good. Minor nit - the parameters are named differently between branch-1 and trunk (mapred.tasktracker.shuffle.fadvise, mapreduce.shuffle.manage.os.cache). Since this will likely go into both at the same time, we could just use the trunk parameter names. How's this been done for other such cases ?

          Show
          Siddharth Seth added a comment - The branch-1 patch looks good. Minor nit - the parameters are named differently between branch-1 and trunk (mapred.tasktracker.shuffle.fadvise, mapreduce.shuffle.manage.os.cache). Since this will likely go into both at the same time, we could just use the trunk parameter names. How's this been done for other such cases ?
          Hide
          Brandon Li added a comment -

          Uploaded the branch-1 patch which combines Todd's two patches and is re-based with the head of branch-1.

          Show
          Brandon Li added a comment - Uploaded the branch-1 patch which combines Todd's two patches and is re-based with the head of branch-1.
          Hide
          Todd Lipcon added a comment -

          BTW, feel free to steal this from me if you are planning to work on it. I won't have time in the near future.

          Show
          Todd Lipcon added a comment - BTW, feel free to steal this from me if you are planning to work on it. I won't have time in the near future.
          Hide
          Todd Lipcon added a comment -

          Attaching two patches which apply to some version of branch-1, though probably not the tip of it. These two, together, implement the feature in MR1 with the bug fix mentioned above.

          I haven't had time to forward-port the patch to MR2 again with this fix, but it should be reasonably straightforward.

          Show
          Todd Lipcon added a comment - Attaching two patches which apply to some version of branch-1, though probably not the tip of it. These two, together, implement the feature in MR1 with the bug fix mentioned above. I haven't had time to forward-port the patch to MR2 again with this fix, but it should be reasonably straightforward.
          Hide
          Brandon Li added a comment -

          Hi Todd, if you have a new patch, could you please upload it?

          Show
          Brandon Li added a comment - Hi Todd, if you have a new patch, could you please upload it?
          Hide
          Todd Lipcon added a comment -

          Found an issue with this optimization today in the case where some map output partitions are really large (eg on the order of many GBs). Because the fadvise is a blocking call, this can block on the order of minutes before sending the first bytes of response data. This causes the reducer to time out, etc.

          Instead, we should fadvise only a "chunk" ahead of the current read position.

          Show
          Todd Lipcon added a comment - Found an issue with this optimization today in the case where some map output partitions are really large (eg on the order of many GBs). Because the fadvise is a blocking call, this can block on the order of minutes before sending the first bytes of response data. This causes the reducer to time out, etc. Instead, we should fadvise only a "chunk" ahead of the current read position.
          Hide
          Todd Lipcon added a comment -

          Arun, any thoughts on this? Would like to get this in for 0.23.1

          Show
          Todd Lipcon added a comment - Arun, any thoughts on this? Would like to get this in for 0.23.1
          Hide
          Todd Lipcon added a comment -

          Move the config key/default-value to YarnConfiguration, we don't spread them around now

          On further thought, YarnConfig doesn't seem like the right place for it – this is an MR config rather than YARN, given it's part of the mapreduce-client-shuffle project. We also configure mapreduce.shuffle.port here. Is MRConfig the right spot for both of these, maybe?

          Show
          Todd Lipcon added a comment - Move the config key/default-value to YarnConfiguration, we don't spread them around now On further thought, YarnConfig doesn't seem like the right place for it – this is an MR config rather than YARN, given it's part of the mapreduce-client-shuffle project. We also configure mapreduce.shuffle.port here. Is MRConfig the right spot for both of these, maybe?
          Hide
          Arun C Murthy added a comment -

          I'm not super convinced we want to play God with the buffer-cache, but it's too late to sermonize now...

          Minor nits for 23.0:

          1. Please make the default to false
          2. Move the config key/default-value to YarnConfiguration, we don't spread them around now.
          Show
          Arun C Murthy added a comment - I'm not super convinced we want to play God with the buffer-cache, but it's too late to sermonize now... Minor nits for 23.0: Please make the default to false Move the config key/default-value to YarnConfiguration, we don't spread them around now.
          Hide
          Todd Lipcon added a comment -

          I've tested the MR1 equivalent of this and it makes a significant difference. I agree we could prefetch other map outputs as well, and generally do a smarter job of ordering the fetches, but this small patch made a good difference on MR1 - so I don't have any reason to believe it wouldn't help in MR2.

          Show
          Todd Lipcon added a comment - I've tested the MR1 equivalent of this and it makes a significant difference. I agree we could prefetch other map outputs as well, and generally do a smarter job of ordering the fetches, but this small patch made a good difference on MR1 - so I don't have any reason to believe it wouldn't help in MR2.
          Hide
          Arun C Murthy added a comment -

          Todd, have you done any benchmarks with this?

          I'm a bit skeptical... one idea might be to keep the whole map-output in the buffer cache (particularly if it's freed up since the Datanode isn't using as much of it) to speed up all of the shuffle for the whole job. Also, you might want to start pre-fetching outputs of other maps of the same job (application) as an optimization.

          Thoughts?

          Show
          Arun C Murthy added a comment - Todd, have you done any benchmarks with this? I'm a bit skeptical... one idea might be to keep the whole map-output in the buffer cache (particularly if it's freed up since the Datanode isn't using as much of it) to speed up all of the shuffle for the whole job. Also, you might want to start pre-fetching outputs of other maps of the same job (application) as an optimization. Thoughts?
          Hide
          Todd Lipcon added a comment -

          Here's a rather simple implementation that fadvises with WILLNEED just before sending the output, and DONTNEED after sending the output.

          We could make it fancier by fadvising "one map output ahead", since in MR2 a single request can fetch multiple outputs, but I'd prefer to leave that for a later JIRA.

          Show
          Todd Lipcon added a comment - Here's a rather simple implementation that fadvises with WILLNEED just before sending the output, and DONTNEED after sending the output. We could make it fancier by fadvising "one map output ahead", since in MR2 a single request can fetch multiple outputs, but I'd prefer to leave that for a later JIRA.

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development