Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The purpose of this Jira is to decide on Hadoop 1.0 Compatibility requirements
      A proposal is described below that was discussed on email alias core-dev@hadoop.apache.org

      Release terminology used below:

      Standard release numbering: major, minor, dot releases

      • Only bug fixes in dot releases: m.x.y
        • no changes to API, disk format, protocols or config etc. in a dot release
      • new features in major (m.0) and minor (m.x.0) releases

      Hadoop Compatibility Proposal

      • 1 API Compatibility
        No need for client recompilation when upgrading across minor releases (ie. from m.x to m.y, where x <= y)
        Classes or methods deprecated in m.x can be removed in (m+1).0
        Note that this is stronger than what we have been doing in Hadoop 0.x releases.
        This is fairly standard compatibility rules for major and minor releases.
      • 2 Data Compatibility
        • Motivation: Users expect File systems preserve data transparently across releases.
        • 2.a HDFS metadata and data can change across minor or major releases , but such changes are transparent to user application. That is release upgrade must automatically convert the metadata and data as needed. Further, a release upgrade must allow a cluster to roll back to the older version and its older disk format. (rollback needs to restore the orignal data not any updated data).
        • 2.a-WeakerAutomaticConversion:
          Automatic conversion is support across a small number of releases. If a user wants to jump across multiple releases he may be forced to go through a few intermediate release to get to the final desired release.
      • 3 Wire Protocol Compatibility
        We offer no wire compatibility in our 0.x release today.
        • Motivation: The motivation isn't to make the hadoop protocols public. Applications will not call the protocol directly but through a library (in our case FileSystem class and its implementations). Instead the motivation is that customers run multiple clusters and have apps that access data across clusters. Customers cannot be expected to update all clusters simultaneously.
        • 3.a Old m.x clients can connect to new m.y servers, where x <= y but the old clients might get reduced functionality or performance. m.x clients might not be able to connect to (m+1).z servers
        • 3.b. New m.y clients must be able to connect to old m.x server, where x< y but only for old m.x functionality.
          Comment: Generally old API methods continue to use old rpc methods. However, it is legal to have new implementations of old API methods call new
          rpcs methods, as long as the library transparently handles the fallback case for old servers.
        • 3.c. At any major release transition [ ie from a release m.x to a release (m+1).0], a user should be able to read data from the cluster running the old version.
          • Motivation: data copying across clusters is a common operation for many customers. For example this is routinely at done at Yahoo; another use case is HADOOP-4058. Today, http (or hftp) provides a guaranteed compatible way of copying data across versions. Clearly one cannot force a customer to simultaneously update all its Hadoop clusters on to a new major release. We can satisfy this requirement via the http/hftp mechanism or some other mechanism.
        • 3.c-Stronger
          Shall we add a stronger requirement for 1. 0 : wire compatibility across major versions? That is not just for reading but for all operations. This can be supported by class loading or other games.
          Note we can wait to provide this when 2. 0 happens. If Hadoop provided this guarantee then it would allow customers to partition their data across clusters without risking apps breaking across major releases due to wire incompatibility issues.
          • Motivation: Data copying is a compromise. Customers really want to run apps across clusters running different versions.

        Activity

        Hide
        Allen Wittenauer added a comment -

        Hadoop 1.0 was released.

        Show
        Allen Wittenauer added a comment - Hadoop 1.0 was released.
        Hide
        steve_l added a comment -

        5b,Intra-service wire-protocol compatibility

        That's really hard to achieve. Even if you keep the wire format the same, its keeping the explicit and implicit semantics consistent that's tricky. I've been there too many times with web service protocols whose adoption of XML was meant to handle versioning well, but turned out not to.

        I'd be worried about running multiple datanode versions in a single cluster, for example.

        What I'd be happier with would be for a long-haul API build on something like JAX-RS that had enough of a stability guarantee that client apps be they IDE plugins, command line tools, firefox addons could talk the far end through whatever proxies and firewalls got in the way, with the Java-level API compatibility rules helping the uploaded code to work.

        As we don't yet have a JAX-RS based long-haul API yet, we could design in some of this stuff from the outset.

        Show
        steve_l added a comment - 5b,Intra-service wire-protocol compatibility That's really hard to achieve. Even if you keep the wire format the same, its keeping the explicit and implicit semantics consistent that's tricky. I've been there too many times with web service protocols whose adoption of XML was meant to handle versioning well, but turned out not to. I'd be worried about running multiple datanode versions in a single cluster, for example. What I'd be happier with would be for a long-haul API build on something like JAX-RS that had enough of a stability guarantee that client apps be they IDE plugins, command line tools, firefox addons could talk the far end through whatever proxies and firewalls got in the way, with the Java-level API compatibility rules helping the uploaded code to work. As we don't yet have a JAX-RS based long-haul API yet, we could design in some of this stuff from the outset.
        Hide
        dhruba borthakur added a comment -

        The current hadoop code already implements 5a. We typically deploy a new hadoop jar file (having a newedr build version) and just restart the map-reduce cluster without restarting dfs. This works!

        +1 for 5a.

        Show
        dhruba borthakur added a comment - The current hadoop code already implements 5a. We typically deploy a new hadoop jar file (having a newedr build version) and just restart the map-reduce cluster without restarting dfs. This works! +1 for 5a.
        Hide
        Sanjay Radia added a comment -

        In the email discussions, we had decided not to support rolling upgrades for 1.0.
        Below I am documenting the discussion.
        We had decided to go with 5a. for Hadoop 1.0 and do 5b (stronger than 5a) after 1.0.

        • 5. Intra Hadoop Service Compatibility
          The HDFS Service has multiple components (NN, DN, Balancer) that communicate
          amongst themselves. Similarly the MapReduce service has
          components (JR and TT) that communicate amongst themselves.
          Currently we require that the all the components of a service have the same build version and hence talk the same wire protocols.
          This build-version checking prevents rolling upgrades.
        • 5.a HDFS and MapReduce require that their respective sub-components have the same build version in order to form a cluster.
          [ie. Maintain the current mechanism.]
        • 5.b- (ie stronger than 5a). Intra-service wire-protocol compatibility
          Wire protocols between internal Hadoop components are compatible across minor versions.
          Examples are NN-DN, DN-DN and NN-Balancer, etc.
          Old m.x components can talk to new m.y components (x<=y)
          Wire compatibility can break across major versions.
          Motivation: Allow rolling upgrades.
          Note this will not give us rolling upgrades automatically but allow us to build rolling upgrade.
          Essentially it allows a cluster to have components of different versions (e.g. an HDFS cluster with datanodes at different versions
          allowing the operator to roll the older data nodes to a newer version lazily.
        Show
        Sanjay Radia added a comment - In the email discussions, we had decided not to support rolling upgrades for 1.0. Below I am documenting the discussion. We had decided to go with 5a. for Hadoop 1.0 and do 5b (stronger than 5a) after 1.0. 5. Intra Hadoop Service Compatibility The HDFS Service has multiple components (NN, DN, Balancer) that communicate amongst themselves. Similarly the MapReduce service has components (JR and TT) that communicate amongst themselves. Currently we require that the all the components of a service have the same build version and hence talk the same wire protocols. This build-version checking prevents rolling upgrades. 5.a HDFS and MapReduce require that their respective sub-components have the same build version in order to form a cluster. [ie. Maintain the current mechanism.] 5.b- (ie stronger than 5a). Intra-service wire-protocol compatibility Wire protocols between internal Hadoop components are compatible across minor versions. Examples are NN-DN, DN-DN and NN-Balancer, etc. Old m.x components can talk to new m.y components (x<=y) Wire compatibility can break across major versions. Motivation: Allow rolling upgrades. Note this will not give us rolling upgrades automatically but allow us to build rolling upgrade. Essentially it allows a cluster to have components of different versions (e.g. an HDFS cluster with datanodes at different versions allowing the operator to roll the older data nodes to a newer version lazily.
        Hide
        Sanjay Radia added a comment -

        The jobs in the queue have a job definition that is either compatible or not compatible.
        Clearly being able to read those definitions or not is a compatibility issue.
        You are right, apart from that, it is feature and we could choose to not support this till later (and we can do this anytime, it does not have to wait till 2.0).

        Just merely having jobs survive and restarted should be easy (but not a H 1.0 requirement) if the job definitions are compatible across releases.

        Show
        Sanjay Radia added a comment - The jobs in the queue have a job definition that is either compatible or not compatible. Clearly being able to read those definitions or not is a compatibility issue. You are right, apart from that, it is feature and we could choose to not support this till later (and we can do this anytime, it does not have to wait till 2.0). Just merely having jobs survive and restarted should be easy (but not a H 1.0 requirement) if the job definitions are compatible across releases.
        Hide
        Doug Cutting added a comment -

        Job persistence would be a great feature to add at some point, but I don't see why it is essential for 1.0. Hadoop 1.0 is a batch-oriented system, not a high-availability system. We'll force clients to stop accessing HDFS while it is upgraded, and I think a 1.0.1 release that forces folks to empty their job queues before they upgrade would be acceptable. Why is this critical to 1.0, and what does it have to do with compatibility?

        Show
        Doug Cutting added a comment - Job persistence would be a great feature to add at some point, but I don't see why it is essential for 1.0. Hadoop 1.0 is a batch-oriented system, not a high-availability system. We'll force clients to stop accessing HDFS while it is upgraded, and I think a 1.0.1 release that forces folks to empty their job queues before they upgrade would be acceptable. Why is this critical to 1.0, and what does it have to do with compatibility?
        Hide
        Sanjay Radia added a comment -

        oops I forgot to add the proposed job queue compatibility.

        • 4. Job queue "compatibility"
          • Dot releases
            Queued and running jobs survive
            Completed tasks' work is preserved, running tasks are restarted
            (Note when rolling upgrades is available even fewer tasks will fail on a upgrade)
          • Minor releases
            Queued jobs survive
            Running jobs are restarted (Completed tasks' work is not preserved)
            Stronger? Completed tasks' work is preserved?
          • Major releases
            All bets are off!
            Some jobs may not even run due to Api changes
        Show
        Sanjay Radia added a comment - oops I forgot to add the proposed job queue compatibility. 4. Job queue "compatibility" Dot releases Queued and running jobs survive Completed tasks' work is preserved, running tasks are restarted (Note when rolling upgrades is available even fewer tasks will fail on a upgrade) Minor releases Queued jobs survive Running jobs are restarted (Completed tasks' work is not preserved) Stronger? Completed tasks' work is preserved? Major releases All bets are off! Some jobs may not even run due to Api changes
        Hide
        Doug Cutting added a comment -

        Some comments:

        • Please keep proposed solutions out of the problem description. The solution should be developed in the comments. If you have a proposed solution in mind when you file the issue, please add it as the first comment rather than include it in the description.
        • Should we split this into separate issues for HDFS, Core and Mapreduce? The HDFS and Mapreduce issues can depend on the Core issue. As we're planning to split the project, we should avoid detailed issues that span the three sub-projects. Spanning issues should mostly just be a collection of per-project issues, with the details in those, no?
        • For HDFS metadata we could be clearer about when this is permitted. I think it's supported between minor releases (x.y.* and x.y-1 ) and also between x.0 and x-1.n, where n is the current x-1 minor release when x.0 is first released, or somesuch.
        • as for 3c, let's start with the weaker version for now. If we decide to make major releases more frequently than every couple of years then we might then consider the stronger version.
        Show
        Doug Cutting added a comment - Some comments: Please keep proposed solutions out of the problem description. The solution should be developed in the comments. If you have a proposed solution in mind when you file the issue, please add it as the first comment rather than include it in the description. Should we split this into separate issues for HDFS, Core and Mapreduce? The HDFS and Mapreduce issues can depend on the Core issue. As we're planning to split the project, we should avoid detailed issues that span the three sub-projects. Spanning issues should mostly just be a collection of per-project issues, with the details in those, no? For HDFS metadata we could be clearer about when this is permitted. I think it's supported between minor releases (x.y.* and x.y-1 ) and also between x.0 and x-1.n, where n is the current x-1 minor release when x.0 is first released, or somesuch. as for 3c, let's start with the weaker version for now. If we decide to make major releases more frequently than every couple of years then we might then consider the stronger version.

          People

          • Assignee:
            Sanjay Radia
            Reporter:
            Sanjay Radia
          • Votes:
            0 Vote for this issue
            Watchers:
            26 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development