Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5108

Changes needed for Binary Compatibility for MR applications via YARN

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.3-alpha
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      As we get ready to ship out a beta/stable version of hadoop-2, it makes sense to spend time reviewing support for existing MR applications (hadoop-1) to migrate seamlessly.

      We've done various pieces of work over time, let's track progress and document things clearly. Zhijie Shen has done a bunch of testing and results look very promising so far.

      The aim is to support applications using org.apache.hadoop.mapred.* api in a binary compatible manner in hadoop-2 - thus, users can just take existing MR applications jars, point them at YARN clusters and things just work.

      Clearly, we might have some corner cases (haven't seen many so far), including semantics (not just apis); however the intent is to, at least, document them throughly if not actually fix them as feasible.

      Also, it's clear that we will not be able to support org.apache.hadoop.mapreduce api in a binary compatible manner due to the interface changes we made in hadoop-0.21 (sigh), and hence, users using the new apis will have to re-compile (i.e. source compatible only).

      Net, given that vast majority of users use the org.apache.hadoop.mapred api, it's a very reasonable way to ease migration to hadoop-2.

      1. MR_API_DIFF_v2.tar.gz
        3.05 MB
        Zhijie Shen
      2. mr1_mr2_api_diff.tar.gz
        3.03 MB
        Zhijie Shen
      3. Binary Backward Compatibility.pdf
        224 kB
        Zhijie Shen
      1.
      Hadoop-examples-1.x.x.jar cannot run on Yarn Sub-task Closed Zhijie Shen
       
      2.
      Sort in hadoop-1 examples is not binary compatible with hadoop-2 mapred.lib Sub-task Closed Zhijie Shen
       
      3.
      Aggregatewordcount and aggregatewordhist in hadoop-1 examples are not binary compatible with hadoop-2 mapred.lib.aggregate Sub-task Closed Zhijie Shen
       
      4.
      Aggregatewordcount and aggregatewordhist in hadoop-1 examples can not find their inner classes when running on Yarn Sub-task Closed Zhijie Shen
       
      5.
      Document MR Binary Compatibility vis-a-vis hadoop-1 and hadoop-2 Sub-task Closed Zhijie Shen
       
      6.
      JobStatus#getJobPriority changed to JobStatus#getPriority in MR2 Sub-task Resolved Sandy Ryza
       
      7.
      Mapred API: TaskCompletionEvent incompatibility issues with MR1 Sub-task Closed Zhijie Shen
       
      8.
      Fix JobClient incompatibilities with MR1 Sub-task Closed Karthik Kambatla
       
      9.
      Enum Counter is removed from FileInputFormat and FileOutputFormat of both mapred and mapreduce Sub-task Closed Mayank Bansal
       
      10.
      TEMP_DIR_NAME is removed from of FileOutputCommitter of mapreduce Sub-task Closed Zhijie Shen
       
      11.
      createFileSplit is removed from NLineInputFormat of mapred Sub-task Closed Mayank Bansal
       
      12.
      Constructor of DBInputFormat.DBRecordReader in mapred is changed Sub-task Closed Zhijie Shen
       
      13.
      Functions are changed or removed from Job in jobcontrol Sub-task Closed Mayank Bansal
       
      14.
      Signature changes for getTaskId of TaskReport in mapred Sub-task Closed Mayank Bansal
       
      15.
      mapred.Counters incompatiblity issues with MR1 Sub-task Closed Mayank Bansal
       
      16.
      ClusterStatus incompatiblity issues with MR1 Sub-task Closed Zhijie Shen
       
      17.
      mapreduce.Job has a bunch of methods that throw InterruptedException so its incompatible with MR1 Sub-task Closed Robert Kanter
       
      18.
      mapreduce.Job is missing getJobClient() so its incompatible with MR1 Sub-task Resolved Robert Kanter
       
      19.
      API Incompatibility - Sampler Sub-task Resolved Benoy Antony
       
      20.
      MRAdmin is removed from M/R while RMAdmin is added to Yarn Sub-task Closed Zhijie Shen
       
      21.
      Compatibility: Add a deprecated MRAdmin that wraps around RMAdmin Sub-task Resolved Karthik Kambatla
       
      22.
      Two functions changed their visibility in JobStatus Sub-task Closed Zhijie Shen
       
      23.
      A number of public static variables are removed from JobConf Sub-task Closed Zhijie Shen
       
      24.
      filecache.DistributedCache incompatiblity issues with MR1 Sub-task Closed Zhijie Shen
       
      25.
      Protected variables are removed from CombineFileRecordReader in both mapred and mapreduce Sub-task Closed Mayank Bansal
       
      26.
      Mapreduce API: String toHex(byte[]) is removed from SecureShuffleUtils Sub-task Closed Mayank Bansal
       
      27.
      Mapreduce API: TokenCache incompatibility issues with MR1 Sub-task Closed Mayank Bansal
       
      28.
      Mapreduce API: ClusterMetrics incompatibility issues with MR1 Sub-task Closed Mayank Bansal
       
      29.
      Mapreduce API: Counter changes from non-abstract class to interface Sub-task Resolved Zhijie Shen
       
      30.
      Mapreduce API: CounterGroup changes from non-abstract class to interface Sub-task Resolved Zhijie Shen
       
      31.
      Mapred API: Function signature change in JobControl Sub-task Closed Zhijie Shen
       
      32.
      Mapred API: void setTaskID(TaskAttemptID) is missing in TaskCompletionEvent Sub-task Closed Zhijie Shen
       
      33.
      Two function signature changes in filecache.DistributedCache Sub-task Closed Zhijie Shen
       
      34.
      mapreduce.Job killTask/failTask/getTaskCompletionEvents methods have incompatible signature changes Sub-task Closed Karthik Kambatla
       
      35.
      Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2 Sub-task Resolved Robert Kanter
       
      36.
      Binary and source incompatibility in mapreduce.TaskID and mapreduce.TaskAttemptID between branch-1 and branch-2 Sub-task Resolved Robert Kanter
       
      37.
      Binary and source incompatibility in mapred.lib.CombineFileInputFormat between branch-1 and branch-2 Sub-task Resolved Robert Kanter
       
      38.
      Binary Incompatibility of O.A.H.U.mapred.SequenceFileAsBinaryOutputFormat.WritableValueBytes Sub-task Resolved Zhijie Shen
       

        Activity

          People

          • Assignee:
            Zhijie Shen
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            30 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development