Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-5108

Changes needed for Binary Compatibility for MR applications via YARN

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.0.3-alpha
    • None
    • None
    • None

    Description

      As we get ready to ship out a beta/stable version of hadoop-2, it makes sense to spend time reviewing support for existing MR applications (hadoop-1) to migrate seamlessly.

      We've done various pieces of work over time, let's track progress and document things clearly. Zhijie Shen has done a bunch of testing and results look very promising so far.

      The aim is to support applications using org.apache.hadoop.mapred.* api in a binary compatible manner in hadoop-2 - thus, users can just take existing MR applications jars, point them at YARN clusters and things just work.

      Clearly, we might have some corner cases (haven't seen many so far), including semantics (not just apis); however the intent is to, at least, document them throughly if not actually fix them as feasible.

      Also, it's clear that we will not be able to support org.apache.hadoop.mapreduce api in a binary compatible manner due to the interface changes we made in hadoop-0.21 (sigh), and hence, users using the new apis will have to re-compile (i.e. source compatible only).

      Net, given that vast majority of users use the org.apache.hadoop.mapred api, it's a very reasonable way to ease migration to hadoop-2.

      Attachments

        1. Binary Backward Compatibility.pdf
          224 kB
          Zhijie Shen
        2. MR_API_DIFF_v2.tar.gz
          3.05 MB
          Zhijie Shen
        3. mr1_mr2_api_diff.tar.gz
          3.03 MB
          Zhijie Shen
        1.
        Hadoop-examples-1.x.x.jar cannot run on Yarn Sub-task Closed Zhijie Shen Actions
        2.
        Sort in hadoop-1 examples is not binary compatible with hadoop-2 mapred.lib Sub-task Closed Zhijie Shen Actions
        3.
        Aggregatewordcount and aggregatewordhist in hadoop-1 examples are not binary compatible with hadoop-2 mapred.lib.aggregate Sub-task Closed Zhijie Shen Actions
        4.
        Aggregatewordcount and aggregatewordhist in hadoop-1 examples can not find their inner classes when running on Yarn Sub-task Closed Zhijie Shen Actions
        5.
        Document MR Binary Compatibility vis-a-vis hadoop-1 and hadoop-2 Sub-task Closed Zhijie Shen Actions
        6.
        JobStatus#getJobPriority changed to JobStatus#getPriority in MR2 Sub-task Resolved Sandy Ryza Actions
        7.
        Mapred API: TaskCompletionEvent incompatibility issues with MR1 Sub-task Closed Zhijie Shen Actions
        8.
        Fix JobClient incompatibilities with MR1 Sub-task Closed Karthik Kambatla Actions
        9.
        Enum Counter is removed from FileInputFormat and FileOutputFormat of both mapred and mapreduce Sub-task Closed Mayank Bansal Actions
        10.
        TEMP_DIR_NAME is removed from of FileOutputCommitter of mapreduce Sub-task Closed Zhijie Shen Actions
        11.
        createFileSplit is removed from NLineInputFormat of mapred Sub-task Closed Mayank Bansal Actions
        12.
        Constructor of DBInputFormat.DBRecordReader in mapred is changed Sub-task Closed Zhijie Shen Actions
        13.
        Functions are changed or removed from Job in jobcontrol Sub-task Closed Mayank Bansal Actions
        14.
        Signature changes for getTaskId of TaskReport in mapred Sub-task Closed Mayank Bansal Actions
        15.
        mapred.Counters incompatiblity issues with MR1 Sub-task Closed Mayank Bansal Actions
        16.
        ClusterStatus incompatiblity issues with MR1 Sub-task Closed Zhijie Shen Actions
        17.
        mapreduce.Job has a bunch of methods that throw InterruptedException so its incompatible with MR1 Sub-task Closed Robert Kanter Actions
        18.
        mapreduce.Job is missing getJobClient() so its incompatible with MR1 Sub-task Resolved Robert Kanter Actions
        19.
        API Incompatibility - Sampler Sub-task Resolved Benoy Antony Actions
        20.
        MRAdmin is removed from M/R while RMAdmin is added to Yarn Sub-task Closed Zhijie Shen Actions
        21.
        Compatibility: Add a deprecated MRAdmin that wraps around RMAdmin Sub-task Resolved Karthik Kambatla Actions
        22.
        Two functions changed their visibility in JobStatus Sub-task Closed Zhijie Shen Actions
        23.
        A number of public static variables are removed from JobConf Sub-task Closed Zhijie Shen Actions
        24.
        filecache.DistributedCache incompatiblity issues with MR1 Sub-task Closed Zhijie Shen Actions
        25.
        Protected variables are removed from CombineFileRecordReader in both mapred and mapreduce Sub-task Closed Mayank Bansal Actions
        26.
        Mapreduce API: String toHex(byte[]) is removed from SecureShuffleUtils Sub-task Closed Mayank Bansal Actions
        27.
        Mapreduce API: TokenCache incompatibility issues with MR1 Sub-task Closed Mayank Bansal Actions
        28.
        Mapreduce API: ClusterMetrics incompatibility issues with MR1 Sub-task Closed Mayank Bansal Actions
        29.
        Mapreduce API: Counter changes from non-abstract class to interface Sub-task Resolved Zhijie Shen Actions
        30.
        Mapreduce API: CounterGroup changes from non-abstract class to interface Sub-task Resolved Zhijie Shen Actions
        31.
        Mapred API: Function signature change in JobControl Sub-task Closed Zhijie Shen Actions
        32.
        Mapred API: void setTaskID(TaskAttemptID) is missing in TaskCompletionEvent Sub-task Closed Zhijie Shen Actions
        33.
        Two function signature changes in filecache.DistributedCache Sub-task Closed Zhijie Shen Actions
        34.
        mapreduce.Job killTask/failTask/getTaskCompletionEvents methods have incompatible signature changes Sub-task Closed Karthik Kambatla Actions
        35.
        Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2 Sub-task Closed Robert Kanter Actions
        36.
        Binary and source incompatibility in mapreduce.TaskID and mapreduce.TaskAttemptID between branch-1 and branch-2 Sub-task Closed Robert Kanter Actions
        37.
        Binary and source incompatibility in mapred.lib.CombineFileInputFormat between branch-1 and branch-2 Sub-task Closed Robert Kanter Actions
        38.
        Binary Incompatibility of O.A.H.U.mapred.SequenceFileAsBinaryOutputFormat.WritableValueBytes Sub-task Closed Zhijie Shen Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            zjshen Zhijie Shen
            acmurthy Arun Murthy
            Votes:
            0 Vote for this issue
            Watchers:
            30 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment