Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5108

Changes needed for Binary Compatibility for MR applications via YARN

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.3-alpha
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      As we get ready to ship out a beta/stable version of hadoop-2, it makes sense to spend time reviewing support for existing MR applications (hadoop-1) to migrate seamlessly.

      We've done various pieces of work over time, let's track progress and document things clearly. Zhijie Shen has done a bunch of testing and results look very promising so far.

      The aim is to support applications using org.apache.hadoop.mapred.* api in a binary compatible manner in hadoop-2 - thus, users can just take existing MR applications jars, point them at YARN clusters and things just work.

      Clearly, we might have some corner cases (haven't seen many so far), including semantics (not just apis); however the intent is to, at least, document them throughly if not actually fix them as feasible.

      Also, it's clear that we will not be able to support org.apache.hadoop.mapreduce api in a binary compatible manner due to the interface changes we made in hadoop-0.21 (sigh), and hence, users using the new apis will have to re-compile (i.e. source compatible only).

      Net, given that vast majority of users use the org.apache.hadoop.mapred api, it's a very reasonable way to ease migration to hadoop-2.

        Attachments

        1. Binary Backward Compatibility.pdf
          224 kB
          Zhijie Shen
        2. mr1_mr2_api_diff.tar.gz
          3.03 MB
          Zhijie Shen
        3. MR_API_DIFF_v2.tar.gz
          3.05 MB
          Zhijie Shen
        1.
        Hadoop-examples-1.x.x.jar cannot run on Yarn Sub-task Closed Zhijie Shen
        2.
        Sort in hadoop-1 examples is not binary compatible with hadoop-2 mapred.lib Sub-task Closed Zhijie Shen
        3.
        Aggregatewordcount and aggregatewordhist in hadoop-1 examples are not binary compatible with hadoop-2 mapred.lib.aggregate Sub-task Closed Zhijie Shen
        4.
        Aggregatewordcount and aggregatewordhist in hadoop-1 examples can not find their inner classes when running on Yarn Sub-task Closed Zhijie Shen
        5.
        Document MR Binary Compatibility vis-a-vis hadoop-1 and hadoop-2 Sub-task Closed Zhijie Shen
        6.
        JobStatus#getJobPriority changed to JobStatus#getPriority in MR2 Sub-task Resolved Sandy Ryza
        7.
        Mapred API: TaskCompletionEvent incompatibility issues with MR1 Sub-task Closed Zhijie Shen
        8.
        Fix JobClient incompatibilities with MR1 Sub-task Closed Karthik Kambatla
        9.
        Enum Counter is removed from FileInputFormat and FileOutputFormat of both mapred and mapreduce Sub-task Closed Mayank Bansal
        10.
        TEMP_DIR_NAME is removed from of FileOutputCommitter of mapreduce Sub-task Closed Zhijie Shen
        11.
        createFileSplit is removed from NLineInputFormat of mapred Sub-task Closed Mayank Bansal
        12.
        Constructor of DBInputFormat.DBRecordReader in mapred is changed Sub-task Closed Zhijie Shen
        13.
        Functions are changed or removed from Job in jobcontrol Sub-task Closed Mayank Bansal
        14.
        Signature changes for getTaskId of TaskReport in mapred Sub-task Closed Mayank Bansal
        15.
        mapred.Counters incompatiblity issues with MR1 Sub-task Closed Mayank Bansal
        16.
        ClusterStatus incompatiblity issues with MR1 Sub-task Closed Zhijie Shen
        17.
        mapreduce.Job has a bunch of methods that throw InterruptedException so its incompatible with MR1 Sub-task Closed Robert Kanter
        18.
        mapreduce.Job is missing getJobClient() so its incompatible with MR1 Sub-task Resolved Robert Kanter
        19.
        API Incompatibility - Sampler Sub-task Resolved Benoy Antony
        20.
        MRAdmin is removed from M/R while RMAdmin is added to Yarn Sub-task Closed Zhijie Shen
        21.
        Compatibility: Add a deprecated MRAdmin that wraps around RMAdmin Sub-task Resolved Karthik Kambatla
        22.
        Two functions changed their visibility in JobStatus Sub-task Closed Zhijie Shen
        23.
        A number of public static variables are removed from JobConf Sub-task Closed Zhijie Shen
        24.
        filecache.DistributedCache incompatiblity issues with MR1 Sub-task Closed Zhijie Shen
        25.
        Protected variables are removed from CombineFileRecordReader in both mapred and mapreduce Sub-task Closed Mayank Bansal
        26.
        Mapreduce API: String toHex(byte[]) is removed from SecureShuffleUtils Sub-task Closed Mayank Bansal
        27.
        Mapreduce API: TokenCache incompatibility issues with MR1 Sub-task Closed Mayank Bansal
        28.
        Mapreduce API: ClusterMetrics incompatibility issues with MR1 Sub-task Closed Mayank Bansal
        29.
        Mapreduce API: Counter changes from non-abstract class to interface Sub-task Resolved Zhijie Shen
        30.
        Mapreduce API: CounterGroup changes from non-abstract class to interface Sub-task Resolved Zhijie Shen
        31.
        Mapred API: Function signature change in JobControl Sub-task Closed Zhijie Shen
        32.
        Mapred API: void setTaskID(TaskAttemptID) is missing in TaskCompletionEvent Sub-task Closed Zhijie Shen
        33.
        Two function signature changes in filecache.DistributedCache Sub-task Closed Zhijie Shen
        34.
        mapreduce.Job killTask/failTask/getTaskCompletionEvents methods have incompatible signature changes Sub-task Closed Karthik Kambatla
        35.
        Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2 Sub-task Closed Robert Kanter
        36.
        Binary and source incompatibility in mapreduce.TaskID and mapreduce.TaskAttemptID between branch-1 and branch-2 Sub-task Closed Robert Kanter
        37.
        Binary and source incompatibility in mapred.lib.CombineFileInputFormat between branch-1 and branch-2 Sub-task Closed Robert Kanter
        38.
        Binary Incompatibility of O.A.H.U.mapred.SequenceFileAsBinaryOutputFormat.WritableValueBytes Sub-task Closed Zhijie Shen

          Activity

            People

            • Assignee:
              zjshen Zhijie Shen
              Reporter:
              acmurthy Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              30 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: