Issue Details (XML | Word | Printable)

Key: HADOOP-5071
Type: Sub-task Sub-task
Status: Open Open
Priority: Major Major
Assignee: Sanjay Radia
Reporter: Sanjay Radia
Votes: 0
Watchers: 21
Operations

If you were logged in you would be able to see more operations.
Hadoop Common
HADOOP-5064

Hadoop 1.0 Compatibility Requirements

Created: 16/Jan/09 06:34 PM   Updated: 05/Mar/09 11:53 AM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified


 Description  « Hide
The purpose of this Jira is to decide on Hadoop 1.0 Compatibility requirements
A proposal is described below that was discussed on email alias core-dev@hadoop.apache.org

Release terminology used below:

Standard release numbering: major, minor, dot releases

  • Only bug fixes in dot releases: m.x.y
    • no changes to API, disk format, protocols or config etc. in a dot release
  • new features in major (m.0) and minor (m.x.0) releases

Hadoop Compatibility Proposal

  • 1 API Compatibility
    No need for client recompilation when upgrading across minor releases (ie. from m.x to m.y, where x <= y)
    Classes or methods deprecated in m.x can be removed in (m+1).0
    Note that this is stronger than what we have been doing in Hadoop 0.x releases.
    This is fairly standard compatibility rules for major and minor releases.
  • 2 Data Compatibility
    • Motivation: Users expect File systems preserve data transparently across releases.
    • 2.a HDFS metadata and data can change across minor or major releases , but such changes are transparent to user application. That is release upgrade must automatically convert the metadata and data as needed. Further, a release upgrade must allow a cluster to roll back to the older version and its older disk format. (rollback needs to restore the orignal data not any updated data).
    • 2.a-WeakerAutomaticConversion:
      Automatic conversion is support across a small number of releases. If a user wants to jump across multiple releases he may be forced to go through a few intermediate release to get to the final desired release.
  • 3 Wire Protocol Compatibility
    We offer no wire compatibility in our 0.x release today.
    • Motivation: The motivation isn't to make the hadoop protocols public. Applications will not call the protocol directly but through a library (in our case FileSystem class and its implementations). Instead the motivation is that customers run multiple clusters and have apps that access data across clusters. Customers cannot be expected to update all clusters simultaneously.
    • 3.a Old m.x clients can connect to new m.y servers, where x <= y but the old clients might get reduced functionality or performance. m.x clients might not be able to connect to (m+1).z servers
    • 3.b. New m.y clients must be able to connect to old m.x server, where x< y but only for old m.x functionality.
      Comment: Generally old API methods continue to use old rpc methods. However, it is legal to have new implementations of old API methods call new
      rpcs methods, as long as the library transparently handles the fallback case for old servers.
    • 3.c. At any major release transition [ ie from a release m.x to a release (m+1).0], a user should be able to read data from the cluster running the old version.
      • Motivation: data copying across clusters is a common operation for many customers. For example this is routinely at done at Yahoo; another use case is HADOOP-4058. Today, http (or hftp) provides a guaranteed compatible way of copying data across versions. Clearly one cannot force a customer to simultaneously update all its Hadoop clusters on to a new major release. We can satisfy this requirement via the http/hftp mechanism or some other mechanism.
    • 3.c-Stronger
      Shall we add a stronger requirement for 1. 0 : wire compatibility across major versions? That is not just for reading but for all operations. This can be supported by class loading or other games.
      Note we can wait to provide this when 2. 0 happens. If Hadoop provided this guarantee then it would allow customers to partition their data across clusters without risking apps breaking across major releases due to wire incompatibility issues.
      • Motivation: Data copying is a compromise. Customers really want to run apps across clusters running different versions.


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Doug Cutting added a comment - 16/Jan/09 07:01 PM
Some comments:
  • Please keep proposed solutions out of the problem description. The solution should be developed in the comments. If you have a proposed solution in mind when you file the issue, please add it as the first comment rather than include it in the description.
  • Should we split this into separate issues for HDFS, Core and Mapreduce? The HDFS and Mapreduce issues can depend on the Core issue. As we're planning to split the project, we should avoid detailed issues that span the three sub-projects. Spanning issues should mostly just be a collection of per-project issues, with the details in those, no?
  • For HDFS metadata we could be clearer about when this is permitted. I think it's supported between minor releases (x.y.* and x.y-1 ) and also between x.0 and x-1.n, where n is the current x-1 minor release when x.0 is first released, or somesuch.
  • as for 3c, let's start with the weaker version for now. If we decide to make major releases more frequently than every couple of years then we might then consider the stronger version.

Sanjay Radia added a comment - 16/Jan/09 09:34 PM
oops I forgot to add the proposed job queue compatibility.
  • 4. Job queue "compatibility"
    • Dot releases
      Queued and running jobs survive
      Completed tasks' work is preserved, running tasks are restarted
      (Note when rolling upgrades is available even fewer tasks will fail on a upgrade)
    • Minor releases
      Queued jobs survive
      Running jobs are restarted (Completed tasks' work is not preserved)
      Stronger? Completed tasks' work is preserved?
    • Major releases
      All bets are off!
      Some jobs may not even run due to Api changes

Doug Cutting added a comment - 16/Jan/09 10:39 PM
Job persistence would be a great feature to add at some point, but I don't see why it is essential for 1.0. Hadoop 1.0 is a batch-oriented system, not a high-availability system. We'll force clients to stop accessing HDFS while it is upgraded, and I think a 1.0.1 release that forces folks to empty their job queues before they upgrade would be acceptable. Why is this critical to 1.0, and what does it have to do with compatibility?

Sanjay Radia added a comment - 17/Jan/09 02:04 AM
The jobs in the queue have a job definition that is either compatible or not compatible.
Clearly being able to read those definitions or not is a compatibility issue.
You are right, apart from that, it is feature and we could choose to not support this till later (and we can do this anytime, it does not have to wait till 2.0).

Just merely having jobs survive and restarted should be easy (but not a H 1.0 requirement) if the job definitions are compatible across releases.


Sanjay Radia added a comment - 04/Mar/09 01:12 AM
In the email discussions, we had decided not to support rolling upgrades for 1.0.
Below I am documenting the discussion.
We had decided to go with 5a. for Hadoop 1.0 and do 5b (stronger than 5a) after 1.0.
  • 5. Intra Hadoop Service Compatibility
    The HDFS Service has multiple components (NN, DN, Balancer) that communicate
    amongst themselves. Similarly the MapReduce service has
    components (JR and TT) that communicate amongst themselves.
    Currently we require that the all the components of a service have the same build version and hence talk the same wire protocols.
    This build-version checking prevents rolling upgrades.
  • 5.a HDFS and MapReduce require that their respective sub-components have the same build version in order to form a cluster.
    [ie. Maintain the current mechanism.]
  • 5.b- (ie stronger than 5a). Intra-service wire-protocol compatibility
    Wire protocols between internal Hadoop components are compatible across minor versions.
    Examples are NN-DN, DN-DN and NN-Balancer, etc.
    Old m.x components can talk to new m.y components (x<=y)
    Wire compatibility can break across major versions.
    Motivation: Allow rolling upgrades.
    Note this will not give us rolling upgrades automatically but allow us to build rolling upgrade.
    Essentially it allows a cluster to have components of different versions (e.g. an HDFS cluster with datanodes at different versions
    allowing the operator to roll the older data nodes to a newer version lazily.

dhruba borthakur added a comment - 04/Mar/09 05:55 AM
The current hadoop code already implements 5a. We typically deploy a new hadoop jar file (having a newedr build version) and just restart the map-reduce cluster without restarting dfs. This works!

+1 for 5a.


Steve Loughran added a comment - 05/Mar/09 11:53 AM
5b,Intra-service wire-protocol compatibility

That's really hard to achieve. Even if you keep the wire format the same, its keeping the explicit and implicit semantics consistent that's tricky. I've been there too many times with web service protocols whose adoption of XML was meant to handle versioning well, but turned out not to.

I'd be worried about running multiple datanode versions in a single cluster, for example.

What I'd be happier with would be for a long-haul API build on something like JAX-RS that had enough of a stability guarantee that client apps be they IDE plugins, command line tools, firefox addons could talk the far end through whatever proxies and firewalls got in the way, with the Java-level API compatibility rules helping the uploaded code to work.

As we don't yet have a JAX-RS based long-haul API yet, we could design in some of this stuff from the outset.