Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5653

DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.9, 2.2.0
    • Fix Version/s: 3.0.0-alpha1
    • Component/s: distcp
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change
    • Release Note:
      Prior to this change, distcp had hard-coded values for memory usage. Now distcp will honor memory settings in a way compatible with the rest of MapReduce.

      Description

      When a DistCp job is run through Oozie (through a Java action that launches DistCp), one sees that mapred.child.java.opts as set from the caller is honoured by DistCp. But, DistCp doesn't seem to honour any overrides for configs mapreduce.[map,reduce].memory.mb.

      Problem has been identified. I'll post a patch shortly.

      1. MAPREDUCE-5653.branch-0.23.patch
        1 kB
        Mithun Radhakrishnan
      2. MAPREDUCE-5653.branch-2.patch
        1 kB
        Mithun Radhakrishnan
      3. MAPREDUCE-5653.trunk.2.patch
        0.7 kB
        Ratandeep Ratti
      4. MAPREDUCE-5653.trunk.patch
        1 kB
        Mithun Radhakrishnan

        Activity

        Hide
        mithun Mithun Radhakrishnan added a comment -

        addResources(DISTCP_DEFAULT_XML) was incorrect. I now add it in if it's not already been set.

        Show
        mithun Mithun Radhakrishnan added a comment - addResources(DISTCP_DEFAULT_XML) was incorrect. I now add it in if it's not already been set.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12615710/MAPREDUCE-5653.branch-0.23.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-distcp.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4231//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4231//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615710/MAPREDUCE-5653.branch-0.23.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-distcp. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4231//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4231//console This message is automatically generated.
        Hide
        jlowe Jason Lowe added a comment -

        Thanks for the patch, Mithun. Was there a reason addDefaultResource(DISTCP_DEFAULT_XML) wasn't used? I think that should accomplish the same goal without explicit config iteration on DistCp's part, but haven't tried it to verify. Is there a concern that distcp-default.xml could pollute other configurations if DistCp is not used in isolation?

        Show
        jlowe Jason Lowe added a comment - Thanks for the patch, Mithun. Was there a reason addDefaultResource(DISTCP_DEFAULT_XML) wasn't used? I think that should accomplish the same goal without explicit config iteration on DistCp's part, but haven't tried it to verify. Is there a concern that distcp-default.xml could pollute other configurations if DistCp is not used in isolation?
        Hide
        jlowe Jason Lowe added a comment -

        Thinking about this further, I see why addDefaultResource() will not work. distcp-default.xml and mapred-site.xml will often overlap, and it's likely mapred-site.xml will be loaded first then distcp-default.xml will later stomp any overlapping values when it's loaded.

        So I'd like to see a comment for the property iteration loop explaining why we can't use addDefaultResource. Also a unit test would be nice to verify it's working as-intended.

        Show
        jlowe Jason Lowe added a comment - Thinking about this further, I see why addDefaultResource() will not work. distcp-default.xml and mapred-site.xml will often overlap, and it's likely mapred-site.xml will be loaded first then distcp-default.xml will later stomp any overlapping values when it's loaded. So I'd like to see a comment for the property iteration loop explaining why we can't use addDefaultResource. Also a unit test would be nice to verify it's working as-intended.
        Hide
        rdsr Ratandeep Ratti added a comment -

        Hi Jason Lowe, Mithun Radhakrishnan
        We have also been hit by this recently. I spent some time investigating this.

        Distcp has two modes of execution. 1 from the cmdline and the other is programmatically.

        The patch will work correctly for programmatical usage if settings from mapred-site.xml have already been applied to the input Configuration parameter as the properties set by distcp-default.xml will not be overridden again since mapred-site (and also mapred-default/yarn-default/yarn-site) is loaded as a default resource before job submission.

        For command line usage Distcp adds distcp-default.xml as a resource (and not as a default resource) which would take higher precedence than default/site files mentioned before as they are loaded as default resources . Even if Distcp adds distcp-default.xml as a default resource, the code will be brittle and prone to which default resources are loaded first since mapred-site/mapred-default/yarn-site/yarn-default are all loaded in static blocks in classes org.apache.hadoop.mapreduce.

        {Job, Cluster}

        Since distcp is just like any other MR job I think the best way would be to get rid of un-needed conf from distcp-default.xml.
        Below are the properties mentioned in distcp-default.xml

        distcp.dynamic.strategy.impl
        distcp.static.strategy.impl
        mapred.job.map.memory.mb
        mapred.job.reduce.memory.mb
        mapred.reducer.new-api
        mapreduce.reduce.class
        

        Seems like getting rid of

        mapred.job.{map|reduce}.memory.mb

        is all we need as the rest are required by distcp.
        Any other configuration the user wants to specify in distcp can very well be specified as jvm opts for cmd line usage and as simple parameters to Configuration option for programmatical usage.

        Please update with your thoughts/concerns.

        Show
        rdsr Ratandeep Ratti added a comment - Hi Jason Lowe , Mithun Radhakrishnan We have also been hit by this recently. I spent some time investigating this. Distcp has two modes of execution. 1 from the cmdline and the other is programmatically. The patch will work correctly for programmatical usage if settings from mapred-site.xml have already been applied to the input Configuration parameter as the properties set by distcp-default.xml will not be overridden again since mapred-site (and also mapred-default/yarn-default/yarn-site) is loaded as a default resource before job submission. For command line usage Distcp adds distcp-default.xml as a resource (and not as a default resource) which would take higher precedence than default/site files mentioned before as they are loaded as default resources . Even if Distcp adds distcp-default.xml as a default resource, the code will be brittle and prone to which default resources are loaded first since mapred-site/mapred-default/yarn-site/yarn-default are all loaded in static blocks in classes org.apache.hadoop.mapreduce. {Job, Cluster} Since distcp is just like any other MR job I think the best way would be to get rid of un-needed conf from distcp-default.xml. Below are the properties mentioned in distcp-default.xml distcp.dynamic.strategy.impl distcp.static.strategy.impl mapred.job.map.memory.mb mapred.job.reduce.memory.mb mapred.reducer.new-api mapreduce.reduce.class Seems like getting rid of mapred.job.{map|reduce}.memory.mb is all we need as the rest are required by distcp. Any other configuration the user wants to specify in distcp can very well be specified as jvm opts for cmd line usage and as simple parameters to Configuration option for programmatical usage. Please update with your thoughts/concerns.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12615710/MAPREDUCE-5653.branch-0.23.patch
        against trunk revision b610c68.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-distcp.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5217//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5217//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12615710/MAPREDUCE-5653.branch-0.23.patch against trunk revision b610c68. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5217//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5217//console This message is automatically generated.
        Hide
        rdsr Ratandeep Ratti added a comment -

        Updating patch with proposed changes.

        Show
        rdsr Ratandeep Ratti added a comment - Updating patch with proposed changes.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12701008/MAPREDUCE-5653.trunk.2.patch
        against trunk revision 166eecf.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. There were no new javadoc warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-tools/hadoop-distcp.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5223//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5223//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12701008/MAPREDUCE-5653.trunk.2.patch against trunk revision 166eecf. +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . There were no new javadoc warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-tools/hadoop-distcp. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5223//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5223//console This message is automatically generated.
        Hide
        aw Allen Wittenauer added a comment -

        Changing a default like this is an incompatible and/or potentially surprising change. This put it in line with all of the other changes that have happened with heap management in trunk (MAPREDUCE-5785, HADOOP-10950, etc).

        So I'm +1 this change for trunk and will commit it shortly.

        Show
        aw Allen Wittenauer added a comment - Changing a default like this is an incompatible and/or potentially surprising change. This put it in line with all of the other changes that have happened with heap management in trunk ( MAPREDUCE-5785 , HADOOP-10950 , etc). So I'm +1 this change for trunk and will commit it shortly.
        Hide
        aw Allen Wittenauer added a comment -

        committed to trunk.

        thanks!

        Show
        aw Allen Wittenauer added a comment - committed to trunk. thanks!
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-trunk-Commit #7227 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7227/)
        MAPREDUCE-5653. DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a)

        • hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        • hadoop-mapreduce-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-trunk-Commit #7227 (See https://builds.apache.org/job/Hadoop-trunk-Commit/7227/ ) MAPREDUCE-5653 . DistCp does not honour config-overrides for mapreduce. [map,reduce] .memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a) hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml hadoop-mapreduce-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #119 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/119/)
        MAPREDUCE-5653. DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a)

        • hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        • hadoop-mapreduce-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #119 (See https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/119/ ) MAPREDUCE-5653 . DistCp does not honour config-overrides for mapreduce. [map,reduce] .memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a) hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml hadoop-mapreduce-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Yarn-trunk #853 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/853/)
        MAPREDUCE-5653. DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a)

        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Yarn-trunk #853 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/853/ ) MAPREDUCE-5653 . DistCp does not honour config-overrides for mapreduce. [map,reduce] .memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a) hadoop-mapreduce-project/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Hdfs-trunk #2051 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2051/)
        MAPREDUCE-5653. DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a)

        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk #2051 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/2051/ ) MAPREDUCE-5653 . DistCp does not honour config-overrides for mapreduce. [map,reduce] .memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a) hadoop-mapreduce-project/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #110 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/110/)
        MAPREDUCE-5653. DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a)

        • hadoop-mapreduce-project/CHANGES.txt
        • hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #110 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/110/ ) MAPREDUCE-5653 . DistCp does not honour config-overrides for mapreduce. [map,reduce] .memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a) hadoop-mapreduce-project/CHANGES.txt hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #119 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/119/)
        MAPREDUCE-5653. DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a)

        • hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        • hadoop-mapreduce-project/CHANGES.txt
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #119 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/119/ ) MAPREDUCE-5653 . DistCp does not honour config-overrides for mapreduce. [map,reduce] .memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a) hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml hadoop-mapreduce-project/CHANGES.txt
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2069 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2069/)
        MAPREDUCE-5653. DistCp does not honour config-overrides for mapreduce.[map,reduce].memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a)

        • hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml
        • hadoop-mapreduce-project/CHANGES.txt
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2069 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2069/ ) MAPREDUCE-5653 . DistCp does not honour config-overrides for mapreduce. [map,reduce] .memory.mb (Ratandeep Ratti via aw) (aw: rev 039366e3b430ff7d9a7ff30405a0431292069a8a) hadoop-tools/hadoop-distcp/src/main/resources/distcp-default.xml hadoop-mapreduce-project/CHANGES.txt
        Hide
        mithun Mithun Radhakrishnan added a comment -

        Hey, Ratandeep Ratti. (Long time! :]). Hello, Allen Wittenauer. Apologies for the unreasonably delayed response.

        Thank you both for addressing this. I agree. Removing those properties from the default.xml does solve the immediate problem.

        It's a different matter that the initialization path is still incorrect (because of how/when default.xml is loaded). This might not be of consequence. Should that change later, we'll tackle that in a separate JIRA.

        Cheers.

        Show
        mithun Mithun Radhakrishnan added a comment - Hey, Ratandeep Ratti . (Long time! :]). Hello, Allen Wittenauer . Apologies for the unreasonably delayed response. Thank you both for addressing this. I agree. Removing those properties from the default.xml does solve the immediate problem. It's a different matter that the initialization path is still incorrect (because of how/when default.xml is loaded). This might not be of consequence. Should that change later, we'll tackle that in a separate JIRA. Cheers.
        Hide
        philip Philip Zeyliger added a comment -

        You could make an argument that DistCp, as a Yarn application, knows better than the defaults about how much memory it uses. I.e., that the bug is that DistCp isn't setting both intimately related settings ({{mapred.job.

        {map|reduce}

        .memory.mb}} and mapreduce.map.java.opts, but rather than just one. If the defaults in your cluster were to use a lot of memory, and DistCP uses very little (after all, it's copying a buffer around), it's wasteful.

        Show
        philip Philip Zeyliger added a comment - You could make an argument that DistCp, as a Yarn application, knows better than the defaults about how much memory it uses. I.e., that the bug is that DistCp isn't setting both intimately related settings ({{mapred.job. {map|reduce} .memory.mb}} and mapreduce.map.java.opts , but rather than just one. If the defaults in your cluster were to use a lot of memory, and DistCP uses very little (after all, it's copying a buffer around), it's wasteful.
        Hide
        aw Allen Wittenauer added a comment -

        You could make an argument that DistCp, as a Yarn application, knows better than the defaults about how much memory it uses.

        One could, but one pass through the code makes that argument kind of moot given there is zero logic currently for distcp to have that level of intelligence.

        If the defaults in your cluster were to use a lot of memory, and DistCP uses very little (after all, it's copying a buffer around), it's wasteful.

        Part of the argument around MAPREDUCE-5785 revolves around the fact that one doesn't actually want to set defaults anymore for MapReduce applications.

        Show
        aw Allen Wittenauer added a comment - You could make an argument that DistCp, as a Yarn application, knows better than the defaults about how much memory it uses. One could, but one pass through the code makes that argument kind of moot given there is zero logic currently for distcp to have that level of intelligence. If the defaults in your cluster were to use a lot of memory, and DistCP uses very little (after all, it's copying a buffer around), it's wasteful. Part of the argument around MAPREDUCE-5785 revolves around the fact that one doesn't actually want to set defaults anymore for MapReduce applications.
        Hide
        philip Philip Zeyliger added a comment -

        Allen, do you think there's more than just this one Xmx passthrough that's affecting DistCP? There's not much smarts it needs: it's not like it's every doing anything besides copying files.

        Hadn't seen MAPREDUCE-5785. Agree that that's an excellent direction.

        Show
        philip Philip Zeyliger added a comment - Allen, do you think there's more than just this one Xmx passthrough that's affecting DistCP? There's not much smarts it needs: it's not like it's every doing anything besides copying files. Hadn't seen MAPREDUCE-5785 . Agree that that's an excellent direction.
        Hide
        aw Allen Wittenauer added a comment -

        Allen, do you think there's more than just this one Xmx passthrough that's affecting DistCP?

        Yes. but those are outside the scope of this jira.

        Hadn't seen MAPREDUCE-5785. Agree that that's an excellent direction.

        As I mentioned way above, for the same reasons that '5785 is in trunk, this change is also only in trunk. Hypothetically, they should work very well together.

        Show
        aw Allen Wittenauer added a comment - Allen, do you think there's more than just this one Xmx passthrough that's affecting DistCP? Yes. but those are outside the scope of this jira. Hadn't seen MAPREDUCE-5785 . Agree that that's an excellent direction. As I mentioned way above, for the same reasons that '5785 is in trunk, this change is also only in trunk. Hypothetically, they should work very well together.
        Hide
        kshukla Kuhu Shukla added a comment -

        Ratandeep Ratti, Allen Wittenauer, Could we pull this back into 2.8? Appreciate any comments on this. Thanks!

        CC: Eric Payne

        Show
        kshukla Kuhu Shukla added a comment - Ratandeep Ratti , Allen Wittenauer , Could we pull this back into 2.8? Appreciate any comments on this. Thanks! CC: Eric Payne
        Hide
        aw Allen Wittenauer added a comment -

        Could we pull this back into 2.8?

        No. See previous comments as to why.

        Show
        aw Allen Wittenauer added a comment - Could we pull this back into 2.8? No. See previous comments as to why.
        Hide
        kshukla Kuhu Shukla added a comment -

        Allen Wittenauer. Thanks! got it!

        Show
        kshukla Kuhu Shukla added a comment - Allen Wittenauer . Thanks! got it!
        Hide
        aw Allen Wittenauer added a comment -

        Yeah, the whole situation is kind of screwy and I'm definitely sympathetic.. distcp directly setting memory parameters is clearly a problem but we also can't suddenly change this after years of working this way by simply removing those parameters. A "reasonable default" in code bases without MAPREDUCE-5785 and without making the situation worse (e.g., Philip's concerns, the proposed "fix" in HADOOP-14176, us making assumptions about what JVM is in use, etc) would need to be provided.

        distcp, years ago, probably should have provided alternate ways to set it's memory, etc, and then changed these params based upon either the provided values or use it's own internal defaults when they weren't overriden. This "hides" the fact that it's "just" a fancy MR job while still providing the flexibility needed for advanced users. For branch-2, that's really the only way I see out of this mess. But that ship has mostly sailed, with Hadoop 2.x on it's deathbed and the problem mostly solved in 3.x by this patch+MAPREDUCE-5785.

        Show
        aw Allen Wittenauer added a comment - Yeah, the whole situation is kind of screwy and I'm definitely sympathetic.. distcp directly setting memory parameters is clearly a problem but we also can't suddenly change this after years of working this way by simply removing those parameters. A "reasonable default" in code bases without MAPREDUCE-5785 and without making the situation worse (e.g., Philip's concerns, the proposed "fix" in HADOOP-14176 , us making assumptions about what JVM is in use, etc) would need to be provided. distcp, years ago, probably should have provided alternate ways to set it's memory, etc, and then changed these params based upon either the provided values or use it's own internal defaults when they weren't overriden. This "hides" the fact that it's "just" a fancy MR job while still providing the flexibility needed for advanced users. For branch-2, that's really the only way I see out of this mess. But that ship has mostly sailed, with Hadoop 2.x on it's deathbed and the problem mostly solved in 3.x by this patch+ MAPREDUCE-5785 .
        Hide
        kshukla Kuhu Shukla added a comment -

        Thank you so much Allen Wittenauer for the explanation. I too had my doubts whether this change will break a lot of users if we backported the patch but I agree that this change belongs to 3.x line.

        Show
        kshukla Kuhu Shukla added a comment - Thank you so much Allen Wittenauer for the explanation. I too had my doubts whether this change will break a lot of users if we backported the patch but I agree that this change belongs to 3.x line.

          People

          • Assignee:
            rdsr Ratandeep Ratti
            Reporter:
            mithun Mithun Radhakrishnan
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development