Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0-beta
    • Component/s: documentation
    • Labels:
      None
    • Release Note:
      Document MR Binary Compatibility vis-a-vis hadoop-1 and hadoop-2 for end-users.
    • Target Version/s:
    1. MAPREDUCE-5184.4.patch
      10 kB
      Arun C Murthy
    2. MAPREDUCE-5184.4.patch
      10 kB
      Arun C Murthy
    3. MAPREDUCE-5184.3.patch
      9 kB
      Zhijie Shen
    4. MAPREDUCE-5184.2.patch
      7 kB
      Zhijie Shen
    5. MAPREDUCE-5184.1.patch
      7 kB
      Zhijie Shen

      Issue Links

        Activity

        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1460 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1460/)
        MAPREDUCE-5184. Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen. (Revision 1493570)

        Result = SUCCESS
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493570
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm
        • /hadoop/common/trunk/hadoop-project/src/site/site.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1460 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1460/ ) MAPREDUCE-5184 . Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen. (Revision 1493570) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493570 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm /hadoop/common/trunk/hadoop-project/src/site/site.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1433 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1433/)
        MAPREDUCE-5184. Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen. (Revision 1493570)

        Result = FAILURE
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493570
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm
        • /hadoop/common/trunk/hadoop-project/src/site/site.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1433 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1433/ ) MAPREDUCE-5184 . Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen. (Revision 1493570) Result = FAILURE acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493570 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm /hadoop/common/trunk/hadoop-project/src/site/site.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Yarn-trunk #243 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/243/)
        MAPREDUCE-5184. Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen. (Revision 1493570)

        Result = SUCCESS
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493570
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm
        • /hadoop/common/trunk/hadoop-project/src/site/site.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
        Show
        Hudson added a comment - Integrated in Hadoop-Yarn-trunk #243 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/243/ ) MAPREDUCE-5184 . Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen. (Revision 1493570) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493570 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm /hadoop/common/trunk/hadoop-project/src/site/site.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
        Hide
        Hudson added a comment -

        Integrated in Hadoop-trunk-Commit #3938 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3938/)
        MAPREDUCE-5184. Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen. (Revision 1493570)

        Result = SUCCESS
        acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493570
        Files :

        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm
        • /hadoop/common/trunk/hadoop-project/src/site/site.xml
        • /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
        Show
        Hudson added a comment - Integrated in Hadoop-trunk-Commit #3938 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3938/ ) MAPREDUCE-5184 . Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen. (Revision 1493570) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493570 Files : /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm /hadoop/common/trunk/hadoop-project/src/site/site.xml /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
        Hide
        Arun C Murthy added a comment -

        I just committed this. Thanks Justin Hu!

        Show
        Arun C Murthy added a comment - I just committed this. Thanks Justin Hu !
        Hide
        Arun C Murthy added a comment -

        Updated patch to include this link in ToC in the site.

        Show
        Arun C Murthy added a comment - Updated patch to include this link in ToC in the site.
        Hide
        Arun C Murthy added a comment -

        Updated patch.

        I'll commit this momentarily so that I can unblock 2.1.0-beta. Please file follow-on jiras if necessary.

        Show
        Arun C Murthy added a comment - Updated patch. I'll commit this momentarily so that I can unblock 2.1.0-beta. Please file follow-on jiras if necessary.
        Hide
        Arun C Murthy added a comment -

        I have some minor cosmetic changes, otherwise it looks good.

        Karthik Kambatla I also clarified, a bit, on what we mean by source-compatibility.

        Show
        Arun C Murthy added a comment - I have some minor cosmetic changes, otherwise it looks good. Karthik Kambatla I also clarified, a bit, on what we mean by source-compatibility.
        Hide
        Karthik Kambatla added a comment -

        Recently we conducted intensive research and work on the APIs to fix the problems.

        IMO, we should avoid references to time (recently).The patch documents the state between hadoop-1 and hadoop-2 for eternity

        Think it might be useful to explain (at least briefly) what we mean by binary and source compatibility. I am particular concerned about source compatibility - I think we need to be explicit in what we mean. From what I have learnt recently (while working on HADOOP-9517), source compatibility is not easy to guarantee in Java (e.g. adding a class/method can technically break wildcard imports conflicting with user classes/methods with the same name (package/class/method)) and even the JDK doesn't try to guarantee it. See http://www.oracle.com/technetwork/java/javase/compatibility-417013.html#source and http://wiki.eclipse.org/index.php/Evolving_Java-based_APIs

        Show
        Karthik Kambatla added a comment - Recently we conducted intensive research and work on the APIs to fix the problems. IMO, we should avoid references to time (recently).The patch documents the state between hadoop-1 and hadoop-2 for eternity Think it might be useful to explain (at least briefly) what we mean by binary and source compatibility. I am particular concerned about source compatibility - I think we need to be explicit in what we mean. From what I have learnt recently (while working on HADOOP-9517 ), source compatibility is not easy to guarantee in Java (e.g. adding a class/method can technically break wildcard imports conflicting with user classes/methods with the same name (package/class/method)) and even the JDK doesn't try to guarantee it. See http://www.oracle.com/technetwork/java/javase/compatibility-417013.html#source and http://wiki.eclipse.org/index.php/Evolving_Java-based_APIs
        Hide
        Karthik Kambatla added a comment -

        Just realized this JIRA exists. Reviewing, will post comments shortly.

        Show
        Karthik Kambatla added a comment - Just realized this JIRA exists. Reviewing, will post comments shortly.
        Hide
        Sandy Ryza added a comment -

        +1

        Show
        Sandy Ryza added a comment - +1
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12587116/MAPREDUCE-5184.3.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3758//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3758//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587116/MAPREDUCE-5184.3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +0 tests included . The patch appears to be a documentation patch that doesn't require tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3758//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3758//console This message is automatically generated.
        Hide
        Zhijie Shen added a comment -

        Thanks, Sandy Ryza! Updated the document. In addition, I included more breaks to 0.23, which will be fixed in MAPREDUCE-5304.

        Show
        Zhijie Shen added a comment - Thanks, Sandy Ryza ! Updated the document. In addition, I included more breaks to 0.23, which will be fixed in MAPREDUCE-5304 .
        Hide
        Sandy Ryza added a comment -

        Thanks, Zhijie Shen, a few more nitpicks:

        +  no major surgery has been conducted on it. Therefore, MRv2 is able to ensure
        +  satisfactory compatibility to MRv1 applications. However, due to some
        

        Should be: "satisfactory compatibility with MRv1 applications"

        +  <<mapred>> APIs. This means that applications which were compiled with MRv1
        +  <<mapred>> APIs can run directly on YARN without recompilation.
        

        Should be "This means that applications which were built against MRv1 <<mapred>> APIs"

        +  We cannot ensure complete binary compatibility to the applications that use
        

        Should be: "binary compatibility to the applications"

        +  compatibility. Users are suggested to recompile their applications against
        +  MRv2 <<mapreduce>> APIs. One important binary incompatibility break is
        

        Should be: "Users should recompile their applications that use <<mapreduce>> APIs against MRv2 jars"

        +  Counter and CounterGroup, Users are recommended to recompile their
        +  applications if they used the two interfaces. On the other hand, <<mapreduce>>
        

        Probably don't need to recommend a recompile twice.

        +  neither support binary compatibility nor source compatibility to the
        +  applications that use this class directly.
        

        Should be: "compatibility for the applications*

        +  Unfortunately, maintaining binary compatibility of <<mapred>> APIs for MRv1
        +  applications may lead to binary incompatibility issues for early MRv2
        

        Should be: "maintaining binary compatibility with <<mapred>>"

        + please note that <<<hadoop -jar hadoop-examples-1.x.x.jar>>> is still using
        

        Should be: "will still use"

        Show
        Sandy Ryza added a comment - Thanks, Zhijie Shen , a few more nitpicks: + no major surgery has been conducted on it. Therefore, MRv2 is able to ensure + satisfactory compatibility to MRv1 applications. However, due to some Should be: "satisfactory compatibility with MRv1 applications" + <<mapred>> APIs. This means that applications which were compiled with MRv1 + <<mapred>> APIs can run directly on YARN without recompilation. Should be "This means that applications which were built against MRv1 <<mapred>> APIs" + We cannot ensure complete binary compatibility to the applications that use Should be: "binary compatibility to the applications" + compatibility. Users are suggested to recompile their applications against + MRv2 <<mapreduce>> APIs. One important binary incompatibility break is Should be: "Users should recompile their applications that use <<mapreduce>> APIs against MRv2 jars " + Counter and CounterGroup, Users are recommended to recompile their + applications if they used the two interfaces. On the other hand, <<mapreduce>> Probably don't need to recommend a recompile twice. + neither support binary compatibility nor source compatibility to the + applications that use this class directly. Should be: "compatibility for the applications* + Unfortunately, maintaining binary compatibility of <<mapred>> APIs for MRv1 + applications may lead to binary incompatibility issues for early MRv2 Should be: "maintaining binary compatibility with <<mapred>>" + please note that <<<hadoop -jar hadoop-examples-1.x.x.jar>>> is still using Should be: " will still use"
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12586384/MAPREDUCE-5184.2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3738//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3738//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586384/MAPREDUCE-5184.2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +0 tests included . The patch appears to be a documentation patch that doesn't require tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3738//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3738//console This message is automatically generated.
        Hide
        Zhijie Shen added a comment -

        Updated the document. Please have a look. Thanks!

        Show
        Zhijie Shen added a comment - Updated the document. Please have a look. Thanks!
        Hide
        Sandy Ryza added a comment -

        My bad, I thought MAPREDUCE-4942 had already been reverted. If it stays in, then saying "Source Compatibility" makes sense to me. If not, or if there are places where source compatibility is broken for the "mapreduce" APIs, then we should update the terminology to reflect this.

        Show
        Sandy Ryza added a comment - My bad, I thought MAPREDUCE-4942 had already been reverted. If it stays in, then saying "Source Compatibility" makes sense to me. If not, or if there are places where source compatibility is broken for the "mapreduce" APIs, then we should update the terminology to reflect this.
        Hide
        Zhijie Shen added a comment -

        Thanks for your review, Sandy Ryza!

        Not sure that it makes sense to have this section and its predecessor described as "Source Compatibility" vs. "Binary Compatibility". MAPREDUCE-4942, for example, tracks changes that are binary-compatible, but not source-compatible. Maybe it would make more sense to demarcate them as old API vs. new API or incompatilities vs. compatibilities.

        "Source Compatibility" here should indicate that if we cannot ensure binary compatibility, then we should ensure source compatibility. In fact, not all mapreduce APIs break binary compatibility. I'll clarify it in the next draft.

        AFIAK, the patch of MAPREDUCE-4942 breaks the source compatibility to 0.23 not MR1. If the patch is reverted, it is still binary compatibile to MR1, but not source compatibile. However, it is worth noting it down.

        Show
        Zhijie Shen added a comment - Thanks for your review, Sandy Ryza ! Not sure that it makes sense to have this section and its predecessor described as "Source Compatibility" vs. "Binary Compatibility". MAPREDUCE-4942 , for example, tracks changes that are binary-compatible, but not source-compatible. Maybe it would make more sense to demarcate them as old API vs. new API or incompatilities vs. compatibilities. "Source Compatibility" here should indicate that if we cannot ensure binary compatibility, then we should ensure source compatibility. In fact, not all mapreduce APIs break binary compatibility. I'll clarify it in the next draft. AFIAK, the patch of MAPREDUCE-4942 breaks the source compatibility to 0.23 not MR1. If the patch is reverted, it is still binary compatibile to MR1, but not source compatibile. However, it is worth noting it down.
        Hide
        Jason Lowe added a comment -

        I'm a little confused by this. What does it mean that "hadoop -jar hadoop-examples-1.x.x.jar" is still using the 2.x.x.jar? Where is the 2.x.x jar coming from?

        It's coming from the classpath. If someone tries to run the 1.x examples jar, the classes from the 2.x jar will end up being picked up before the 1.x jar because the Hadoop framework jars appear before the user's stuff in the classpath by default. Therefore even though it looks like you're running the 1.x version of wordcount, you're really running the 2.x version because of the default classpath setup.

        That's why HADOOP_USER_CLASSPATH_FIRST=true and -Dmapreduce.job.user.classpath.first=true are necessary to make sure you're really running the 1.x code. The environment variable is to make sure the job client picks up the code from the 1.x jar, and the property is to make sure the tasks pick up the 1.x classes when they run on the nodes.

        Show
        Jason Lowe added a comment - I'm a little confused by this. What does it mean that "hadoop -jar hadoop-examples-1.x.x.jar" is still using the 2.x.x.jar? Where is the 2.x.x jar coming from? It's coming from the classpath. If someone tries to run the 1.x examples jar, the classes from the 2.x jar will end up being picked up before the 1.x jar because the Hadoop framework jars appear before the user's stuff in the classpath by default. Therefore even though it looks like you're running the 1.x version of wordcount, you're really running the 2.x version because of the default classpath setup. That's why HADOOP_USER_CLASSPATH_FIRST=true and -Dmapreduce.job.user.classpath.first=true are necessary to make sure you're really running the 1.x code. The environment variable is to make sure the job client picks up the code from the 1.x jar, and the property is to make sure the tasks pick up the 1.x classes when they run on the nodes.
        Hide
        Sandy Ryza added a comment -

        Zhijie Shen, thanks a ton for writing this up. Overall, it looks great to me. I have a bunch of stylistic/grammatical suggestions. (Obviously) feel free to take what sounds good to you and leave out what does not. For the sake of brevity, I've mostly written out changes that would sound the best to me without explanations, but I would be happy to provide the rationale on any of them.

        In general, other docs have tended to use the all-capitalized YARN over Yarn, so I would advocate for following this convention.

        +  MapReduce NextGen (aka MRv2) has spin off the resource management
        +  responsibility, which has upgraded to Yarn, a general-purpose, distributed,
        +  application management framework. Meanwhile, MapReduce remains as a pure
        +  distributed computation framework.
        

        MapReduce NextGen (aka MRv2) has spun off resource management capabilities into YARN, a general purpose, distributed application management framework. MapReduce remains as a pure distributed computation framework.

        +  In general, the previous MapReduce framework (aka MRv1) has been reused and
        +  no major surgery has been conducted on it. Therefore, MRv2 is able to ensure
        

        In general, the APIs of the previous MapReduce framework (aka MRv1) have been preserved without the need for any major surgery.

        +  satisfactory compatibility to MRv1 applications. However, because of
        +  improvement and code refactoring, backward compatibility of some APIs still
        +  are broken.
        

        However, due to some improvements and code refactorings, a few APIs have been rendered backward-incompatible.

        +  First of all, we ensure binary compatibility to the applications that use
        +  "mapred" APIs. It means that the applications which are compiled with MRv1
        +  "mapred" APIs can directly run on Yarn, while recompiling is not required.
        

        First, we ensure binary compatibility for applications that use the old "mapred" APIs. This means that applications which were compiled with MRv1 "mapred" APIs can run directly on YARN without recompilation.

        +* {Source Compatibility}
        +
        +  However, we cannot ensure binary compatibility to the applications that use
        +  "mapreduce" APIs, as these APIs have evolved a lot since MRv1. Instead, the
        +  applications only need to be recompiled against MRv2 "mapreduce" APIs. The
        +  important binary incompatible spots are Counter and CounterGroup, Users are
        +  recommended to recompile their applications if they used the two interfaces.
        

        Not sure that it makes sense to have this section and its predecessor described as "Source Compatibility" vs. "Binary Compatibility". MAPREDUCE-4942, for example, tracks changes that are binary-compatible, but not source-compatible. Maybe it would make more sense to demarcate them as old API vs. new API or incompatilities vs. compatibilities.

        +  MRAdmin is removed in MRv2. Since it is supposed to be used through CLI
        +  commands, we don't support binary compatibility for the applications that use
        +  this class directly.
        

        MRAdmin has been removed in MRv2 because because the mradmin commands no longer exist. They have been replaced by the commands in rmadmin.

        +* {Tradeoff between MRv1 Users and Early MRv2 Adopters}
        

        Should be tradeoff*s*

        +  Unfortunately, some changes to "mapred" APIs lead to binary incompatible issues
        +  for the early MRv2 adopters, in particular Hadoop 0.23 users. They are caused
        +  by the incompatible method signature. For example, ProgramDriver#drive returns
        +  void in MRv1, but returns int in MRv2. Under the either-or situation, we choose
        +  to be compatible to MRv1 applications, which have a larger user base. Bellow is
        +  the list of the spots where Hadoop 0.23 applications will be broken.
        

        Unfortunately, maintaining compatibility with the "mapred" APIs in MRv1 may led to binary incompatibility issues for early MRv2 adopters, in particular Hadoop 0.23 users. For the "mapred" APIs, we have chosen to be compatible with MRv1 applications, which have a larger user base. Below is the list of MapReduce APIs which are incompatible with Hadoop 0.23.

        +* {Malicious}
        +
        + For the users who are going to try hadoop-examples-1.x.x.jar on Yarn, please
        + note that "hadoop -jar hadoop-examples-1.x.x.jar" is still using the 2.x.x.jar.
        + Users should either remove hadoop-mapreduce-examples-2.x.x.jar from the
        + classpath or set "HADOOP_USER_CLASSPATH_FIRST=true" and
        + "HADOOP_CLASSPATH=...:hadoop-examples-1.x.x.jar" to run their target examples
        + jar.
        

        I'm a little confused by this. What does it mean that "hadoop -jar hadoop-examples-1.x.x.jar" is still using the 2.x.x.jar? Where is the 2.x.x jar coming from?

        Show
        Sandy Ryza added a comment - Zhijie Shen , thanks a ton for writing this up. Overall, it looks great to me. I have a bunch of stylistic/grammatical suggestions. (Obviously) feel free to take what sounds good to you and leave out what does not. For the sake of brevity, I've mostly written out changes that would sound the best to me without explanations, but I would be happy to provide the rationale on any of them. In general, other docs have tended to use the all-capitalized YARN over Yarn, so I would advocate for following this convention. + MapReduce NextGen (aka MRv2) has spin off the resource management + responsibility, which has upgraded to Yarn, a general-purpose, distributed, + application management framework. Meanwhile, MapReduce remains as a pure + distributed computation framework. MapReduce NextGen (aka MRv2) has spun off resource management capabilities into YARN, a general purpose, distributed application management framework. MapReduce remains as a pure distributed computation framework. + In general, the previous MapReduce framework (aka MRv1) has been reused and + no major surgery has been conducted on it. Therefore, MRv2 is able to ensure In general, the APIs of the previous MapReduce framework (aka MRv1) have been preserved without the need for any major surgery. + satisfactory compatibility to MRv1 applications. However, because of + improvement and code refactoring, backward compatibility of some APIs still + are broken. However, due to some improvements and code refactorings, a few APIs have been rendered backward-incompatible. + First of all, we ensure binary compatibility to the applications that use + "mapred" APIs. It means that the applications which are compiled with MRv1 + "mapred" APIs can directly run on Yarn, while recompiling is not required. First, we ensure binary compatibility for applications that use the old "mapred" APIs. This means that applications which were compiled with MRv1 "mapred" APIs can run directly on YARN without recompilation. +* {Source Compatibility} + + However, we cannot ensure binary compatibility to the applications that use + "mapreduce" APIs, as these APIs have evolved a lot since MRv1. Instead, the + applications only need to be recompiled against MRv2 "mapreduce" APIs. The + important binary incompatible spots are Counter and CounterGroup, Users are + recommended to recompile their applications if they used the two interfaces. Not sure that it makes sense to have this section and its predecessor described as "Source Compatibility" vs. "Binary Compatibility". MAPREDUCE-4942 , for example, tracks changes that are binary-compatible, but not source-compatible. Maybe it would make more sense to demarcate them as old API vs. new API or incompatilities vs. compatibilities. + MRAdmin is removed in MRv2. Since it is supposed to be used through CLI + commands, we don't support binary compatibility for the applications that use + this class directly. MRAdmin has been removed in MRv2 because because the mradmin commands no longer exist. They have been replaced by the commands in rmadmin. +* {Tradeoff between MRv1 Users and Early MRv2 Adopters} Should be tradeoff*s* + Unfortunately, some changes to "mapred" APIs lead to binary incompatible issues + for the early MRv2 adopters, in particular Hadoop 0.23 users. They are caused + by the incompatible method signature. For example, ProgramDriver#drive returns + void in MRv1, but returns int in MRv2. Under the either-or situation, we choose + to be compatible to MRv1 applications, which have a larger user base. Bellow is + the list of the spots where Hadoop 0.23 applications will be broken. Unfortunately, maintaining compatibility with the "mapred" APIs in MRv1 may led to binary incompatibility issues for early MRv2 adopters, in particular Hadoop 0.23 users. For the "mapred" APIs, we have chosen to be compatible with MRv1 applications, which have a larger user base. Below is the list of MapReduce APIs which are incompatible with Hadoop 0.23. +* {Malicious} + + For the users who are going to try hadoop-examples-1.x.x.jar on Yarn, please + note that "hadoop -jar hadoop-examples-1.x.x.jar" is still using the 2.x.x.jar. + Users should either remove hadoop-mapreduce-examples-2.x.x.jar from the + classpath or set "HADOOP_USER_CLASSPATH_FIRST= true " and + "HADOOP_CLASSPATH=...:hadoop-examples-1.x.x.jar" to run their target examples + jar. I'm a little confused by this. What does it mean that "hadoop -jar hadoop-examples-1.x.x.jar" is still using the 2.x.x.jar? Where is the 2.x.x jar coming from?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12586238/MAPREDUCE-5184.1.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3732//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3732//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586238/MAPREDUCE-5184.1.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +0 tests included . The patch appears to be a documentation patch that doesn't require tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3732//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3732//console This message is automatically generated.
        Hide
        Zhijie Shen added a comment -

        Drafted a document about the compatibility issues. Please have a look, and your comments are welcome.

        Show
        Zhijie Shen added a comment - Drafted a document about the compatibility issues. Please have a look, and your comments are welcome.
        Hide
        Arun C Murthy added a comment -

        Thanks for taking this up Zhijie!

        Show
        Arun C Murthy added a comment - Thanks for taking this up Zhijie!
        Hide
        Zhijie Shen added a comment -

        In the document, it is worth mentioning that since hadoop-examples-2.x.jar is the classpath, "hadoop -jar hadoop-examples-1.x.jar"is still using the 2.x jar. Users should either remove 2.x jar from the classpath or set HADOOP_USER_CLASSPATH_FIRST=true and HADOOP_CLASSPATH=...:hadoop-examples-1.x.jar to run 1.x jar actually.

        Show
        Zhijie Shen added a comment - In the document, it is worth mentioning that since hadoop-examples-2.x.jar is the classpath, "hadoop -jar hadoop-examples-1.x.jar"is still using the 2.x jar. Users should either remove 2.x jar from the classpath or set HADOOP_USER_CLASSPATH_FIRST=true and HADOOP_CLASSPATH=...:hadoop-examples-1.x.jar to run 1.x jar actually.

          People

          • Assignee:
            Zhijie Shen
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development