Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1.0-beta
    • Component/s: documentation
    • Labels:
      None

      Description

      As we get ready to call hadoop-2 stable we need to better define 'Hadoop Compatibility'.

      http://wiki.apache.org/hadoop/Compatibility is a start, let's document requirements clearly and completely.

      1. hadoop-9517-v5.patch
        23 kB
        Arun C Murthy
      2. hadoop-9517-v4-v5-diff.patch
        9 kB
        Arun C Murthy
      3. hadoop-9517-v4.patch
        18 kB
        Karthik Kambatla
      4. hadoop-9517-v3.patch
        18 kB
        Karthik Kambatla
      5. hadoop-9517-v2.patch
        17 kB
        Karthik Kambatla
      6. hadoop-9517-proposal-v1.patch
        16 kB
        Eli Collins
      7. hadoop-9517-proposal-v1.patch
        16 kB
        Karthik Kambatla
      8. hadoop-9517.patch
        11 kB
        Karthik Kambatla
      9. hadoop-9517.patch
        11 kB
        Karthik Kambatla
      10. hadoop-9517.patch
        11 kB
        Karthik Kambatla
      11. hadoop-9517.patch
        10 kB
        Karthik Kambatla

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Arun C Murthy made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #1460 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1460/)
          HADOOP-9517. Documented various aspects of compatibility for Apache Hadoop. Contributed by Karthik Kambatla. (Revision 1493693)

          Result = SUCCESS
          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493693
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
          • /hadoop/common/trunk/hadoop-project/src/site/site.xml
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1460 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1460/ ) HADOOP-9517 . Documented various aspects of compatibility for Apache Hadoop. Contributed by Karthik Kambatla. (Revision 1493693) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493693 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm /hadoop/common/trunk/hadoop-project/src/site/site.xml
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Hdfs-trunk #1433 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1433/)
          HADOOP-9517. Documented various aspects of compatibility for Apache Hadoop. Contributed by Karthik Kambatla. (Revision 1493693)

          Result = FAILURE
          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493693
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
          • /hadoop/common/trunk/hadoop-project/src/site/site.xml
          Show
          Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1433 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1433/ ) HADOOP-9517 . Documented various aspects of compatibility for Apache Hadoop. Contributed by Karthik Kambatla. (Revision 1493693) Result = FAILURE acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493693 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm /hadoop/common/trunk/hadoop-project/src/site/site.xml
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Yarn-trunk #243 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/243/)
          HADOOP-9517. Documented various aspects of compatibility for Apache Hadoop. Contributed by Karthik Kambatla. (Revision 1493693)

          Result = SUCCESS
          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493693
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
          • /hadoop/common/trunk/hadoop-project/src/site/site.xml
          Show
          Hudson added a comment - Integrated in Hadoop-Yarn-trunk #243 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/243/ ) HADOOP-9517 . Documented various aspects of compatibility for Apache Hadoop. Contributed by Karthik Kambatla. (Revision 1493693) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493693 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm /hadoop/common/trunk/hadoop-project/src/site/site.xml
          Hide
          Hudson added a comment -

          Integrated in Hadoop-trunk-Commit #3951 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3951/)
          HADOOP-9517. Documented various aspects of compatibility for Apache Hadoop. Contributed by Karthik Kambatla. (Revision 1493693)

          Result = SUCCESS
          acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493693
          Files :

          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
          • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm
          • /hadoop/common/trunk/hadoop-project/src/site/site.xml
          Show
          Hudson added a comment - Integrated in Hadoop-trunk-Commit #3951 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3951/ ) HADOOP-9517 . Documented various aspects of compatibility for Apache Hadoop. Contributed by Karthik Kambatla. (Revision 1493693) Result = SUCCESS acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1493693 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/Compatibility.apt.vm /hadoop/common/trunk/hadoop-project/src/site/site.xml
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 2.1.0-beta [ 12324030 ]
          Resolution Fixed [ 1 ]
          Hide
          Arun C Murthy added a comment -

          I just committed this - this was an important one. Thanks Karthik Kambatla!

          Show
          Arun C Murthy added a comment - I just committed this - this was an important one. Thanks Karthik Kambatla !
          Hide
          Karthik Kambatla added a comment -

          Thanks for chipping in, Arun C Murthy. The additions make sense to me. +1 to the v5 patch.

          Thoughts on binary compatibility:

          1. Given that it is hard to guarantee source API compatibility in Java (to the extent that JDK itself doesn't), I wasn't sure if we should separately call out binary compatibility.
          2. However, I really like the way ABI is described in the latest patch. It mentions all of API, wire and semantic compatibilities, and focuses on the practical implications for end-users. Thanks for including this.
          Show
          Karthik Kambatla added a comment - Thanks for chipping in, Arun C Murthy . The additions make sense to me. +1 to the v5 patch. Thoughts on binary compatibility: Given that it is hard to guarantee source API compatibility in Java (to the extent that JDK itself doesn't), I wasn't sure if we should separately call out binary compatibility. However, I really like the way ABI is described in the latest patch. It mentions all of API, wire and semantic compatibilities, and focuses on the practical implications for end-users. Thanks for including this.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12588042/hadoop-9517-v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2654//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2654//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12588042/hadoop-9517-v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +0 tests included . The patch appears to be a documentation patch that doesn't require tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2654//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2654//console This message is automatically generated.
          Hide
          Arun C Murthy added a comment -

          I hope we can iterate on this quickly and commit fast since it's blocking 2.1.0-beta.

          I propose to commit this by Sun night PST if I don't see further comments - we can always file follow-on jiras. Thoughts?

          Show
          Arun C Murthy added a comment - I hope we can iterate on this quickly and commit fast since it's blocking 2.1.0-beta. I propose to commit this by Sun night PST if I don't see further comments - we can always file follow-on jiras. Thoughts?
          Arun C Murthy made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Arun C Murthy made changes -
          Attachment hadoop-9517-v5.patch [ 12588042 ]
          Hide
          Arun C Murthy added a comment -

          Full patch.

          Show
          Arun C Murthy added a comment - Full patch.
          Arun C Murthy made changes -
          Attachment hadoop-9517-v4-v5-diff.patch [ 12588041 ]
          Hide
          Arun C Murthy added a comment -

          Diff to Karthik's hadoop-9517-v4.patch.

          Show
          Arun C Murthy added a comment - Diff to Karthik's hadoop-9517-v4.patch.
          Arun C Murthy made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Arun C Murthy added a comment -

          Mostly looks great, thanks for taking this up Karthik Kambatla!

          I've got some comments, mostly minor - but I think we should add some commentary on MAPREDUCE-5184 etc. and also be more strict about metrics etc.

          I'll attach a diff to your patch which captures comments.

          Show
          Arun C Murthy added a comment - Mostly looks great, thanks for taking this up Karthik Kambatla ! I've got some comments, mostly minor - but I think we should add some commentary on MAPREDUCE-5184 etc. and also be more strict about metrics etc. I'll attach a diff to your patch which captures comments.
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Still to review the doc, but one more thing that I just discovered is that the published site has apidocs which only show @Public APIs. This is different from regular javadoc that you can generate with "mvn javadoc:javadoc" or what your favourite IDE shows. Given that, it is confusing what is the source of truth of InterfaceAudience - Published javadoc on hadoop website or what is present in jars themselves. If it's the later, it isn't clear what the policy is for classes/interfaces with no annotations.

          Show
          Vinod Kumar Vavilapalli added a comment - Still to review the doc, but one more thing that I just discovered is that the published site has apidocs which only show @Public APIs. This is different from regular javadoc that you can generate with "mvn javadoc:javadoc" or what your favourite IDE shows. Given that, it is confusing what is the source of truth of InterfaceAudience - Published javadoc on hadoop website or what is present in jars themselves. If it's the later, it isn't clear what the policy is for classes/interfaces with no annotations.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12587997/hadoop-9517-v4.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 eclipse:eclipse. The patch built with eclipse:eclipse.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2652//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2652//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12587997/hadoop-9517-v4.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +0 tests included . The patch appears to be a documentation patch that doesn't require tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 eclipse:eclipse . The patch built with eclipse:eclipse. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests . The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2652//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2652//console This message is automatically generated.
          Karthik Kambatla made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Karthik Kambatla made changes -
          Attachment hadoop-9517-v4.patch [ 12587997 ]
          Hide
          Karthik Kambatla added a comment -

          Uploading hadoop-9517-v4.patch that (1) removes the (proposal) annotation from policies. Also, did a spell check.

          At this point, I believe all feedback is incorporated. It would be great if we could do another round of reviews and check it in at the earliest.

          Show
          Karthik Kambatla added a comment - Uploading hadoop-9517-v4.patch that (1) removes the (proposal) annotation from policies. Also, did a spell check. At this point, I believe all feedback is incorporated. It would be great if we could do another round of reviews and check it in at the earliest.
          Hide
          Karthik Kambatla added a comment -

          If no one has any comments against the newly proposed policies, I ll upload a new patch on Thursday with the (Proposal) tags removed.

          Show
          Karthik Kambatla added a comment - If no one has any comments against the newly proposed policies, I ll upload a new patch on Thursday with the (Proposal) tags removed.
          Hide
          Karthik Kambatla added a comment -

          I have marked this a blocker for 2.1.0-beta, per conversations on the dev list. I think the next steps are:

          1. verify the doc captures all the items that affect compatibility
          2. the policies for the not-newly-proposed ones are accurate
          3. the newly proposed policies are reasonable
          4. improve the presentation, if need be

          Will gladly incorporate any feedback.

          Show
          Karthik Kambatla added a comment - I have marked this a blocker for 2.1.0-beta, per conversations on the dev list. I think the next steps are: verify the doc captures all the items that affect compatibility the policies for the not-newly-proposed ones are accurate the newly proposed policies are reasonable improve the presentation, if need be Will gladly incorporate any feedback.
          Karthik Kambatla made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          Karthik Kambatla made changes -
          Attachment hadoop-9517-v3.patch [ 12586417 ]
          Hide
          Karthik Kambatla added a comment -

          Patch hadoop-9517-v3.patch includes the text from HADOOP-9619 for proto files.

          Show
          Karthik Kambatla added a comment - Patch hadoop-9517-v3.patch includes the text from HADOOP-9619 for proto files.
          Hide
          Sanjay Radia added a comment -

          ... Optional fields can be added any time .. Fields can be renamed any time ...

          This is what stable means. Hence i suggest that we add a comment to the .proto files to say that the .protos are private-stable, and we could add the comment on kinds of changes allowed. Will file a jira for this. Given that are no annotations for .proto, a comment is the best that can be done.

          Don't you think the proto files for client interfaces should be public? I was chatting with Todd about this, and it seems to us they should.

          I would still mark it as private till we make the rpc and data transfer protocol itself public (ie the protos being public is useless without the rpc proto being public.
          Todd and I occasionally disagree

          Show
          Sanjay Radia added a comment - ... Optional fields can be added any time .. Fields can be renamed any time ... This is what stable means. Hence i suggest that we add a comment to the .proto files to say that the .protos are private-stable, and we could add the comment on kinds of changes allowed. Will file a jira for this. Given that are no annotations for .proto, a comment is the best that can be done. Don't you think the proto files for client interfaces should be public? I was chatting with Todd about this, and it seems to us they should. I would still mark it as private till we make the rpc and data transfer protocol itself public (ie the protos being public is useless without the rpc proto being public. Todd and I occasionally disagree
          Karthik Kambatla made changes -
          Attachment hadoop-9517-v2.patch [ 12586214 ]
          Hide
          Karthik Kambatla added a comment -

          Updated the patch to:

          1. differentiate between the kinds of file formats
          2. list the kinds of HDFS metadata upgrades
          3. include the policy proposal for proto files

          The policies being proposed (from comments in the JIRA) are explicitly labelled (Proposal)

          At this point, it would be great to make sure the document

          1. captures all the items that affect compatibility
          2. the policies for the not-newly-proposed ones are accurate
          3. the newly proposed policies are reasonable
          4. improve the presentation, if need be

          Once done with the above, it might be a good idea to get this in and address items with no policies in subsequent JIRAs or sub-tasks to make sure we discuss them in isolation and detail. Thoughts?

          Show
          Karthik Kambatla added a comment - Updated the patch to: differentiate between the kinds of file formats list the kinds of HDFS metadata upgrades include the policy proposal for proto files The policies being proposed (from comments in the JIRA) are explicitly labelled (Proposal) At this point, it would be great to make sure the document captures all the items that affect compatibility the policies for the not-newly-proposed ones are accurate the newly proposed policies are reasonable improve the presentation, if need be Once done with the above, it might be a good idea to get this in and address items with no policies in subsequent JIRAs or sub-tasks to make sure we discuss them in isolation and detail. Thoughts?
          Hide
          Karthik Kambatla added a comment -

          Classes that subclass a Hadoop class to provide a plugin point MAY need recompiling on each major version, possibly with the handling of changes to methods.

          If the Hadoop class being extended is Public and the stability is defined by the annotation, is that not sufficient indication to the user that it might need to be changed as that interface/class changes. For example, we recently added a SchedulingPolicy to FairScheduler annotated Public-Evolving: the policies written for the current version need to be updated as and when the SchedulingPolicy class changes. Once it becomes stable, we follow the standard deprecation rules for Public-Stable API that protects the policies. No?

          I think it is important to detail how the API compatibility rules impact the user-level code. May be I am missing something here. Otherwise, we might not need specific policies for them?

          Show
          Karthik Kambatla added a comment - Classes that subclass a Hadoop class to provide a plugin point MAY need recompiling on each major version, possibly with the handling of changes to methods. If the Hadoop class being extended is Public and the stability is defined by the annotation, is that not sufficient indication to the user that it might need to be changed as that interface/class changes. For example, we recently added a SchedulingPolicy to FairScheduler annotated Public-Evolving: the policies written for the current version need to be updated as and when the SchedulingPolicy class changes. Once it becomes stable, we follow the standard deprecation rules for Public-Stable API that protects the policies. No? I think it is important to detail how the API compatibility rules impact the user-level code. May be I am missing something here. Otherwise, we might not need specific policies for them?
          Hide
          Steve Loughran added a comment -

          As raised in hdfs-dev, we should also define compatibility for people who develop plugins for things like filesystems, schedulers, sorting, block placement, etc.

          We should make clear that this isn't considered user-level code, and implementors should have no expectation of forward- or backward- compatibility. Indeed, the use of base classes rather than interfaces for many of these plugin points is to enable new methods to be added with ease.

          How about something like (based on my recent FileSystem work):

          • Classes that subclass a Hadoop class to provide a plugin point MAY need recompiling on each major version, possibly with the handling of changes to methods.
          • There is no guarantee that a plugin built for an earlier major version of Hadoop can be directly used in a later major version - implementors of plugins MUST expect to have to release new versions
          • There is no guarantee that a plugin built on a later major version of Hadoop can be directly used in the earlier major versions. Implementors of plugins MUST expect to have to maintain different versions for each major release.
          • We strive to maintain semantic compatibility of existing parameters and methods across major versions. Plugin points MAY tighten their specifications, and MAY mark methods as deprecated. Extra tests MAY be added, for both new features and tightened specifications.
          • We strive to maintain binary and semantic compatibility between minor releases. Hopefully plugin implementations are compatible across minor releases. However, plugin points MAY tighten their specifications -and even add extra methods and/or enumerated options to existing methods. Extra tests MAY be added, for both new features and tightened specifications. Implementors SHOULD test on different releases -especially alpha and beta releases, to catch compatibility issues early.
          Show
          Steve Loughran added a comment - As raised in hdfs-dev, we should also define compatibility for people who develop plugins for things like filesystems, schedulers, sorting, block placement, etc. We should make clear that this isn't considered user-level code, and implementors should have no expectation of forward- or backward- compatibility. Indeed, the use of base classes rather than interfaces for many of these plugin points is to enable new methods to be added with ease. How about something like (based on my recent FileSystem work): Classes that subclass a Hadoop class to provide a plugin point MAY need recompiling on each major version, possibly with the handling of changes to methods. There is no guarantee that a plugin built for an earlier major version of Hadoop can be directly used in a later major version - implementors of plugins MUST expect to have to release new versions There is no guarantee that a plugin built on a later major version of Hadoop can be directly used in the earlier major versions. Implementors of plugins MUST expect to have to maintain different versions for each major release. We strive to maintain semantic compatibility of existing parameters and methods across major versions. Plugin points MAY tighten their specifications, and MAY mark methods as deprecated. Extra tests MAY be added, for both new features and tightened specifications. We strive to maintain binary and semantic compatibility between minor releases. Hopefully plugin implementations are compatible across minor releases. However, plugin points MAY tighten their specifications -and even add extra methods and/or enumerated options to existing methods. Extra tests MAY be added, for both new features and tightened specifications. Implementors SHOULD test on different releases -especially alpha and beta releases, to catch compatibility issues early.
          Hide
          Karthik Kambatla added a comment -

          It might make sense to make certain (non-internal) proto files public, and leave the internal (RM-NM) protos private.

          Also, I think the policy on changes to proto files should be:

          1. Optional fields can be added any time
          2. Fields can be renamed any time
          3. Required fields can't be added within a major release
          4. Field order and type can't be modified within a major release
          Show
          Karthik Kambatla added a comment - It might make sense to make certain (non-internal) proto files public, and leave the internal (RM-NM) protos private. Also, I think the policy on changes to proto files should be: Optional fields can be added any time Fields can be renamed any time Required fields can't be added within a major release Field order and type can't be modified within a major release
          Hide
          Alejandro Abdelnur added a comment -

          Sanjay Radia, on the proto files being public or not. Don't you think the proto files for client interfaces should be public? I was chatting with Todd about this, and it seems to us they should.

          On a different note, we should document we can change name of fields in proto files if we keep the ID/type/required|optional.

          Show
          Alejandro Abdelnur added a comment - Sanjay Radia , on the proto files being public or not. Don't you think the proto files for client interfaces should be public? I was chatting with Todd about this, and it seems to us they should. On a different note, we should document we can change name of fields in proto files if we keep the ID/type/required|optional.
          Hide
          Sanjay Radia added a comment -

          I vote to strengthen the compatibility requirements for user data file formats. ... Rather we'd like to permit both writers or readers of data files to be upgraded independently. ...

          Agreed.
          What do you mean by user data file formats? Do you mean data files that are processed by libraries in user land (as opposed to a server). Would har files, sequence files and RC files be such "user data file formats"? To ground this better, Doug could you please give some examples of kinds of updates one would want to do and of those which would be allowed and which would not.

          Show
          Sanjay Radia added a comment - I vote to strengthen the compatibility requirements for user data file formats. ... Rather we'd like to permit both writers or readers of data files to be upgraded independently. ... Agreed. What do you mean by user data file formats? Do you mean data files that are processed by libraries in user land (as opposed to a server). Would har files, sequence files and RC files be such "user data file formats"? To ground this better, Doug could you please give some examples of kinds of updates one would want to do and of those which would be allowed and which would not.
          Hide
          Sanjay Radia added a comment -

          Shouldn't the proto files themselves be classified as public and stable?

          If you mean the wire protocol proto files, then no. They are not public as we have not made the protocol itself public; we may do that at some point and at that time we would make them public. At this stage we have merely promised to maintain wire compatibility going forward. Right now the proto files should be marked as private-stable.

          Show
          Sanjay Radia added a comment - Shouldn't the proto files themselves be classified as public and stable? If you mean the wire protocol proto files, then no. They are not public as we have not made the protocol itself public; we may do that at some point and at that time we would make them public. At this stage we have merely promised to maintain wire compatibility going forward. Right now the proto files should be marked as private-stable.
          Hide
          Sanjay Radia added a comment -

          wrt HDFS upgrade the current prose is not clear whether we permit HDFS metadata upgrades w/in minor release or not. ..... and explicitly say that metadata upgrades may only be required for major version upgrades?

          Agreed that it is not clear. Given that the conversion is automatic, do we want to allow metadata AND data changes in minor releases ? Or are we trying to say that rolling upgrades are always possible in minor releases and hence we don't want to allow metadata and data changes? Note although we have not figured out how to do rolling upgrades when there are metadata changes, it may be possible to do so. BTW -upgrade is often used even when there are no metadata or data changes as a safety measure.

          Let's clearly distinguish between HDFS upgrades (ie just upgrading the HDFS bits) from an HDFS metadata upgrade,

          Given that this is in the data section I thought it was obvious. Lets modify the "Data" Section to start by saying that we are not talking about the executable bits.

          Also, what does "automatic conversion" mean, that the HDFS metadata upgrade process can automatically convert the old version to the new? As opposed to requiring a user manually perform multiple such upgrades?

          Doesn't the word "automatic" clarify this? Again suggest some text improvements and we can put it in.

          Show
          Sanjay Radia added a comment - wrt HDFS upgrade the current prose is not clear whether we permit HDFS metadata upgrades w/in minor release or not. ..... and explicitly say that metadata upgrades may only be required for major version upgrades? Agreed that it is not clear. Given that the conversion is automatic, do we want to allow metadata AND data changes in minor releases ? Or are we trying to say that rolling upgrades are always possible in minor releases and hence we don't want to allow metadata and data changes? Note although we have not figured out how to do rolling upgrades when there are metadata changes, it may be possible to do so. BTW -upgrade is often used even when there are no metadata or data changes as a safety measure. Let's clearly distinguish between HDFS upgrades (ie just upgrading the HDFS bits) from an HDFS metadata upgrade, Given that this is in the data section I thought it was obvious. Lets modify the "Data" Section to start by saying that we are not talking about the executable bits. Also, what does "automatic conversion" mean, that the HDFS metadata upgrade process can automatically convert the old version to the new? As opposed to requiring a user manually perform multiple such upgrades? Doesn't the word "automatic" clarify this? Again suggest some text improvements and we can put it in.
          Karthik Kambatla made changes -
          Link This issue is blocked by HADOOP-7391 [ HADOOP-7391 ]
          Eli Collins made changes -
          Summary Define Hadoop Compatibility Document Hadoop Compatibility
          Eli Collins made changes -
          Attachment hadoop-9517-proposal-v1.patch [ 12585105 ]
          Hide
          Eli Collins added a comment -
          • Sanjay Radia, Karthik Kambatla - wrt HDFS upgrade the current prose is not clear whether we permit HDFS metadata upgrades w/in minor release or not. Let's clearly distinguish between HDFS upgrades (ie just upgrading the HDFS bits) from an HDFS metadata upgrade, and explicitly say that metadata upgrades may only be required for major version upgrades?
          • Also, what does "automatic conversion" mean, that the HDFS metadata upgrade process can automatically convert the old version to the new? As opposed to requiring a user manually perform multiple such upgrades?
          • It's worth calling out Storage#LAST_UPGRADABLE_HADOOP_VERSION (which is currently "Hadoop-0.18") explicitly (or having it in the generated docs) so people know how to check what version is supported.

          To be clear, we have policy for some of the things here where we've stated that there's currently no policy, just that we can define that in separate jiras for specific pieces vs settle all the policy here (eg we do have a compatibility policy around data formats).

          Updated patch attached with some minor modifications:

          • Under semantic compatibility included javadocs since that's what downstream projects and end users expect us to honor (they're going to read the javadocs for behavior, not the test code)
          • Clarify that admin protocols may be less strict because they only affect admins vs end users
          • Updated java classpath section to not exclude MR jobs as user applications
          • Fixed spelling mistakes and some indenting
          Show
          Eli Collins added a comment - Sanjay Radia , Karthik Kambatla - wrt HDFS upgrade the current prose is not clear whether we permit HDFS metadata upgrades w/in minor release or not. Let's clearly distinguish between HDFS upgrades (ie just upgrading the HDFS bits) from an HDFS metadata upgrade , and explicitly say that metadata upgrades may only be required for major version upgrades? Also, what does "automatic conversion" mean, that the HDFS metadata upgrade process can automatically convert the old version to the new? As opposed to requiring a user manually perform multiple such upgrades? It's worth calling out Storage#LAST_UPGRADABLE_HADOOP_VERSION (which is currently "Hadoop-0.18") explicitly (or having it in the generated docs) so people know how to check what version is supported. To be clear, we have policy for some of the things here where we've stated that there's currently no policy, just that we can define that in separate jiras for specific pieces vs settle all the policy here (eg we do have a compatibility policy around data formats). Updated patch attached with some minor modifications: Under semantic compatibility included javadocs since that's what downstream projects and end users expect us to honor (they're going to read the javadocs for behavior, not the test code) Clarify that admin protocols may be less strict because they only affect admins vs end users Updated java classpath section to not exclude MR jobs as user applications Fixed spelling mistakes and some indenting
          Hide
          Alejandro Abdelnur added a comment -

          Shouldn't the proto files themselves be classified as public and stable?

          Show
          Alejandro Abdelnur added a comment - Shouldn't the proto files themselves be classified as public and stable?
          Karthik Kambatla made changes -
          Attachment hadoop-9517-proposal-v1.patch [ 12584398 ]
          Hide
          Karthik Kambatla added a comment -

          Uploading a patch addressing comments from Sanjay, Steve and Doug:

          1. Add additional compatibility sections,
          2. Add newly proposed policies clearly annotated as (Proposal)
          Show
          Karthik Kambatla added a comment - Uploading a patch addressing comments from Sanjay, Steve and Doug: Add additional compatibility sections, Add newly proposed policies clearly annotated as (Proposal)
          Hide
          Sanjay Radia added a comment -

          >>> HDFS metadata and data can change across minor or major releases ....
          ... Do you think there is merit to adding a policy on such changes being compatible within a major release - by compatible, we mean the compatibility required to be able to run different versions (minor - e.g. 2.1 and 2.2) within the same major release in the same cluster.

          My inclination is no. Updating metadata automatically during release upgrade (minor or major) is fairly easy.

          Show
          Sanjay Radia added a comment - >>> HDFS metadata and data can change across minor or major releases .... ... Do you think there is merit to adding a policy on such changes being compatible within a major release - by compatible, we mean the compatibility required to be able to run different versions (minor - e.g. 2.1 and 2.2) within the same major release in the same cluster. My inclination is no. Updating metadata automatically during release upgrade (minor or major) is fairly easy.
          Hide
          Steve Loughran added a comment -

          And: {-test artifacts}:

          hadoop and its sub-projects often build publish test JAR files with the suffix -test.

          These are test artifacts written for testing the Hadoop components themselves; the publishing to a repository is merely a way of allowing later hadoop modules to pick up the test libraries compatible with the same version of the Hadoop public artifacts.

          While any downstream project is free to use these artifacts in their own test routines, be aware that these artifacts are considered private and unstable. Their contents may change between any release, major or minor.

          If a component in the test libraries proves to be widely reused, it may be extracted and published as its own module -this is exactly what has been done for the hadoop miniclusters. We can only do this if we know which classes in the -test JARs are being used. If you find that you need to use some of our internal-use-only classes in these libraries, please file a JIRA issue requesting that the classes be stabilised and moved into a production library.

          Show
          Steve Loughran added a comment - And: {-test artifacts}: hadoop and its sub-projects often build publish test JAR files with the suffix -test . These are test artifacts written for testing the Hadoop components themselves; the publishing to a repository is merely a way of allowing later hadoop modules to pick up the test libraries compatible with the same version of the Hadoop public artifacts. While any downstream project is free to use these artifacts in their own test routines, be aware that these artifacts are considered private and unstable. Their contents may change between any release, major or minor. If a component in the test libraries proves to be widely reused, it may be extracted and published as its own module -this is exactly what has been done for the hadoop miniclusters. We can only do this if we know which classes in the -test JARs are being used. If you find that you need to use some of our internal-use-only classes in these libraries, please file a JIRA issue requesting that the classes be stabilised and moved into a production library.
          Hide
          Steve Loughran added a comment -

          Karthik, -source tree compatibility

          Yes, why I say patch I mean directory layout -we already have differences there between 1.x and 2.x; new changes are inevitable in future. Source code applicability can change and we must not make any promises there.

          A policy here could be
          minor versions:

          1. No planned changes in source tree layout, though it may happen
          2. Source files may be added, deleted and moved
          3. Source files may change so that patches no longer cleanly apply.

          major versions:

          1. all of the above
          2. a complete refactoring of the source tree, build and test toolings may happen.
          Show
          Steve Loughran added a comment - Karthik, -source tree compatibility Yes, why I say patch I mean directory layout -we already have differences there between 1.x and 2.x; new changes are inevitable in future. Source code applicability can change and we must not make any promises there. A policy here could be minor versions: No planned changes in source tree layout, though it may happen Source files may be added, deleted and moved Source files may change so that patches no longer cleanly apply. major versions: all of the above a complete refactoring of the source tree, build and test toolings may happen.
          Hide
          Steve Loughran added a comment -

          -Doug, good, though to guarantee it will add new unit tests (pull in old release binaries, verify they can still parse newly created artifacts).

          I fear for the native binaries -we have to depend on libsnappy &c to generate compressed content that older versions of their libs can handle. It's also much harder to pull in the old libs for regression testing, the way we can with JAR files on the mvn repo

          Show
          Steve Loughran added a comment - -Doug, good, though to guarantee it will add new unit tests (pull in old release binaries, verify they can still parse newly created artifacts). I fear for the native binaries -we have to depend on libsnappy &c to generate compressed content that older versions of their libs can handle. It's also much harder to pull in the old libs for regression testing, the way we can with JAR files on the mvn repo
          Hide
          Doug Cutting added a comment -

          I vote to strengthen the compatibility requirements for user data file formats. If we permit any user file-format changes that are not forward compatible at all then they must be rare and very well marked as incompatibilities. It's generally better to create a new format that programs must opt in to. An unaltered program, when run against a new version, should ideally continue to generate data that can still be read by older versions and other potential implementations of the format. Otherwise we break workflows unless every element of the flow is updated in lockstep. Rather we'd like to permit both writers or readers of data files to be upgraded independently. Many folks have multiple clusters that are not updated simultaneously and might move data files between them.

          Show
          Doug Cutting added a comment - I vote to strengthen the compatibility requirements for user data file formats. If we permit any user file-format changes that are not forward compatible at all then they must be rare and very well marked as incompatibilities. It's generally better to create a new format that programs must opt in to. An unaltered program, when run against a new version, should ideally continue to generate data that can still be read by older versions and other potential implementations of the format. Otherwise we break workflows unless every element of the flow is updated in lockstep. Rather we'd like to permit both writers or readers of data files to be upgraded independently. Many folks have multiple clusters that are not updated simultaneously and might move data files between them.
          Hide
          Karthik Kambatla added a comment -

          Thanks Steve. Will include all those items in the next version of the patch:

          1. Source code compatibility - when you say the patch will apply (patch -p0), I suppose you are talking about the directory structure of the code and not that the patch will apply cleanly. From a contributor's perspective, this limits any directory changes include any code refactoring - e.g. move an inner class to a separate class. In that case, do we guarantee compatibility within a minor release or a major release.
          2. Configuration: We already have the section - I ll add a note on the default values.
          3. Will add user-level file formats, web UI
          4. Will capture OS, JVM and Hardware under a new section Requirements.
          Show
          Karthik Kambatla added a comment - Thanks Steve. Will include all those items in the next version of the patch: Source code compatibility - when you say the patch will apply (patch -p0), I suppose you are talking about the directory structure of the code and not that the patch will apply cleanly. From a contributor's perspective, this limits any directory changes include any code refactoring - e.g. move an inner class to a separate class. In that case, do we guarantee compatibility within a minor release or a major release. Configuration: We already have the section - I ll add a note on the default values. Will add user-level file formats, web UI Will capture OS, JVM and Hardware under a new section Requirements.
          Hide
          Steve Loughran added a comment -

          Another one: OS/JVM compatibility. What guarantees/promises are made for OS & JVM.

          1. I could see OS family statements: Linux, Windows, vendor-backed versions such as IBM Power, but not specific versions, especially as the OS vendor trails off security patches. Testing and bug reports welcome.
          1. JVM? What could be said here? That the min JVM is specified by the code, no intent to force updates on a point release unless/until that JVM version becomes unsupported. Major releases (or different OS platform/Hardware releases may mandate later versions (e.g windows and Arm prefer open-jdk7). Testing and bug reports welcome.
          1. Hardware? "No plans to make Hadoop specific to any CPU family, even though there may be some CPU family-specific optimisations at the native code level." or "this is driven by (JVM, OS) support; testing always welcome).
          Show
          Steve Loughran added a comment - Another one: OS/JVM compatibility. What guarantees/promises are made for OS & JVM. I could see OS family statements: Linux, Windows, vendor-backed versions such as IBM Power, but not specific versions, especially as the OS vendor trails off security patches. Testing and bug reports welcome. JVM? What could be said here? That the min JVM is specified by the code, no intent to force updates on a point release unless/until that JVM version becomes unsupported. Major releases (or different OS platform/Hardware releases may mandate later versions (e.g windows and Arm prefer open-jdk7). Testing and bug reports welcome. Hardware? "No plans to make Hadoop specific to any CPU family, even though there may be some CPU family-specific optimisations at the native code level." or "this is driven by (JVM, OS) support; testing always welcome).
          Hide
          Joep Rottinghuis added a comment -

          Would snapshots (once they are part of an official release) be part of the compatibility discussion?

          How about tools such as the offline image viewer ? Should it be able to read the format of older images ?

          Show
          Joep Rottinghuis added a comment - Would snapshots (once they are part of an official release) be part of the compatibility discussion? How about tools such as the offline image viewer ? Should it be able to read the format of older images ?
          Hide
          Joep Rottinghuis added a comment -

          "...but such changes are transparent to user application."
          Is that only from the perspective of a user of a cluster before/after upgrade, or also between different clusters ?

          For example, how about being able to copy data from one cluster to another (with different versions).
          For example, the crc changes and distcp must be used with -skipcrccheck
          Does compatibility mean that tools such as distcp can map from one format to the other ?

          Show
          Joep Rottinghuis added a comment - "...but such changes are transparent to user application." Is that only from the perspective of a user of a cluster before/after upgrade, or also between different clusters ? For example, how about being able to copy data from one cluster to another (with different versions). For example, the crc changes and distcp must be used with -skipcrccheck Does compatibility mean that tools such as distcp can map from one format to the other ?
          Hide
          Steve Loughran added a comment -

          Also

          User-level file formats.

          I'm thinking of .har here, compression formats etc. The statement here MUST be, "we will continue to read existing formats, though written data may be in form that is incompatible with older Hadoop versions". There needs to be tests for that backwards compatibility too -some files from 1.x in SVN whose contents are parsed and validated.

          Web UI features

          No guarantees. If you try to screen scrape the web pages, expect to have to change with every major release, potentially the minor one. The only way to be sure of stability is to help define and implement specific machine-parseable web pages/REST APIs. If you use screen-shots as part of your documentation, expect to have to update it with every release.

          Show
          Steve Loughran added a comment - Also User-level file formats. I'm thinking of .har here, compression formats etc. The statement here MUST be, "we will continue to read existing formats, though written data may be in form that is incompatible with older Hadoop versions". There needs to be tests for that backwards compatibility too -some files from 1.x in SVN whose contents are parsed and validated. Web UI features No guarantees. If you try to screen scrape the web pages, expect to have to change with every major release, potentially the minor one. The only way to be sure of stability is to help define and implement specific machine-parseable web pages/REST APIs. If you use screen-shots as part of your documentation, expect to have to update it with every release.
          Hide
          Steve Loughran added a comment -

          There's two extra forms of compatibility we should define and document

          Source code compatibility

          Whether or not a patch will apply across versions. I'm tempted to make no guarantees across major versions (what we have today w.r.t 1.x and 2.x), and a "no deliberate attempt to break things", policy, but otherwise: you are on your own if you fork the code and try to cherry pick, or try to have private patches to apply on top of the source tree. "The best way to ensure your patches stay in sync with the code is to get them into the Apache source tree".

          Configuration compatibility.

          Whether configuration parameters will be consistent over versions; whether, if the names change, there will be a transition from one version to the other. This happens today, with warnings on deprecation -which should be called out. "When warned, fix". Similarly, whether defaults change between major/minor releases. I'd say "sometimes", as it is becoming time to increase the default block size, the default #of reducers and a few more.

          Show
          Steve Loughran added a comment - There's two extra forms of compatibility we should define and document Source code compatibility Whether or not a patch will apply across versions. I'm tempted to make no guarantees across major versions (what we have today w.r.t 1.x and 2.x), and a "no deliberate attempt to break things", policy, but otherwise: you are on your own if you fork the code and try to cherry pick, or try to have private patches to apply on top of the source tree. "The best way to ensure your patches stay in sync with the code is to get them into the Apache source tree". Configuration compatibility. Whether configuration parameters will be consistent over versions; whether, if the names change, there will be a transition from one version to the other. This happens today, with warnings on deprecation -which should be called out. "When warned, fix". Similarly, whether defaults change between major/minor releases. I'd say "sometimes", as it is becoming time to increase the default block size, the default #of reducers and a few more.
          Hide
          Karthik Kambatla added a comment -

          Thanks Sanjay. This is brilliant. Will update the patch to refer to the HADOOP-7391. Also, it would be great to include that in the documentation (the .apt.vm format) and check it in. Let me know if you want me to copy the same and post an update patch to HADOOP-7391. Also, I ll update the patch to have a section "Data Compatibility" with separate bullets for metadata and data formats in place of current "Data format". Will include the weaker version for the overall policy.

          Q. What is the audience for the document generated by this jira - the user of Hadoop, the developer or both?

          The audience for the documentation is both users and developers of Hadoop:

          1. For the users to know what to expect when they upgrade
          2. For the developers to know what changes are allowed on a particular branch/ release

          Do you think they need to be separate?

          HDFS metadata and data can change across minor or major releases , but such changes are transparent to user application.

          This is very assuring to users. Do you think there is merit to adding a policy on such changes being compatible within a major release - by compatible, we mean the compatibility required to be able to run different versions (minor - e.g. 2.1 and 2.2) within the same major release in the same cluster.

          Show
          Karthik Kambatla added a comment - Thanks Sanjay. This is brilliant. Will update the patch to refer to the HADOOP-7391 . Also, it would be great to include that in the documentation (the .apt.vm format) and check it in. Let me know if you want me to copy the same and post an update patch to HADOOP-7391 . Also, I ll update the patch to have a section "Data Compatibility" with separate bullets for metadata and data formats in place of current "Data format". Will include the weaker version for the overall policy. Q. What is the audience for the document generated by this jira - the user of Hadoop, the developer or both? The audience for the documentation is both users and developers of Hadoop: For the users to know what to expect when they upgrade For the developers to know what changes are allowed on a particular branch/ release Do you think they need to be separate? HDFS metadata and data can change across minor or major releases , but such changes are transparent to user application. This is very assuring to users. Do you think there is merit to adding a policy on such changes being compatible within a major release - by compatible, we mean the compatibility required to be able to run different versions (minor - e.g. 2.1 and 2.2) within the same major release in the same cluster.
          Hide
          Sanjay Radia added a comment -

          During the discussions on compatibility I had proposed up the following for data compatibility (I found these in my old notes and should be in some email thread).

          • Data Compatibility
            • HDFS metadata and data can change across minor or major releases , but such
              changes are transparent to user application. A release upgrade must
              automatically convert the metadata and data as needed. Further, a release
              upgrade must allow a cluster to roll back to the older version and its older
              disk format. (rollback needs to restore the original data but not any updated data).
              Motivation: Users expect File systems preserve data transparently across
              releases.
            • Stronger version of the above
              HDFS metadata and data can change across minor or major releases, but such
              changes are transparent to user application. A release upgrade must
              automatically convert the metadata and data as needed. During minor releases,
              disk format changes have to backward and forward compatible; i.e. an older
              version of Hadoop can be started on a newer version of the disk format. Hence
              a version roll back is simple, just restart the older version of Hadoop.
              Major releases allow more significant changes to the disk format and have be
              only backward compatible; however major release upgrade must allow a cluster to
              roll back to the older version and its older disk format.
              With this minor release are very easy to roll back for an admin.
              Note this will restrict the kinds of changes that be made in minor releases.
            • Weaker: Limited Automatic Conversion:
              HDFS metadata and data can change across minor or major releases , but such
              changes are transparent to user application. A release upgrade must
              automatically convert the metadata and data as needed, but automatic conversion is supported across a small number of releases. If a user
              wants to jump across multiple releases he may be forced to go through a few
              intermediate release to get to the final desired release. Further, a release
              upgrade must allow a cluster to roll back to the older version and its older
              disk format. (rollback needs to restore the original data but not any updated data).
              Automatic conversion is support across a small number of releases. If a user
              wants to jump across multiple releases he may be forced to go through a few
              intermediate release to get to the final desired release.

          We currently support the weaker automatic conversion in HDFS.

          Show
          Sanjay Radia added a comment - During the discussions on compatibility I had proposed up the following for data compatibility (I found these in my old notes and should be in some email thread). Data Compatibility HDFS metadata and data can change across minor or major releases , but such changes are transparent to user application. A release upgrade must automatically convert the metadata and data as needed. Further, a release upgrade must allow a cluster to roll back to the older version and its older disk format. (rollback needs to restore the original data but not any updated data). Motivation: Users expect File systems preserve data transparently across releases. Stronger version of the above HDFS metadata and data can change across minor or major releases, but such changes are transparent to user application. A release upgrade must automatically convert the metadata and data as needed. During minor releases, disk format changes have to backward and forward compatible; i.e. an older version of Hadoop can be started on a newer version of the disk format. Hence a version roll back is simple, just restart the older version of Hadoop. Major releases allow more significant changes to the disk format and have be only backward compatible; however major release upgrade must allow a cluster to roll back to the older version and its older disk format. With this minor release are very easy to roll back for an admin. Note this will restrict the kinds of changes that be made in minor releases. Weaker: Limited Automatic Conversion: HDFS metadata and data can change across minor or major releases , but such changes are transparent to user application. A release upgrade must automatically convert the metadata and data as needed, but automatic conversion is supported across a small number of releases. If a user wants to jump across multiple releases he may be forced to go through a few intermediate release to get to the final desired release. Further, a release upgrade must allow a cluster to roll back to the older version and its older disk format. (rollback needs to restore the original data but not any updated data). Automatic conversion is support across a small number of releases. If a user wants to jump across multiple releases he may be forced to go through a few intermediate release to get to the final desired release. We currently support the weaker automatic conversion in HDFS.
          Hide
          Sanjay Radia added a comment -

          Eli/Karthik
          One comment and a question (more over the next few days).

          • For the stability/audience please refer to documentation that I captured from hadoop-5073 discussions and comments in Hadoop-7391 (I can convert this to a different format if needed - i had captured it as html in 2011 and updated it slightly a few minutes ago.). Note your document refers to Hadoop-5073 but it is hard for a reader to scan through all the comments and hence the reason for Hadoop-7391.
          • Q. What is the audience for the document generated by this jira - the user of Hadoop, the developer or both?
          Show
          Sanjay Radia added a comment - Eli/Karthik One comment and a question (more over the next few days). For the stability/audience please refer to documentation that I captured from hadoop-5073 discussions and comments in Hadoop-7391 (I can convert this to a different format if needed - i had captured it as html in 2011 and updated it slightly a few minutes ago.). Note your document refers to Hadoop-5073 but it is hard for a reader to scan through all the comments and hence the reason for Hadoop-7391. Q. What is the audience for the document generated by this jira - the user of Hadoop, the developer or both?
          Sanjay Radia made changes -
          Link This issue is related to HADOOP-7391 [ HADOOP-7391 ]
          Hide
          Eli Collins added a comment -

          Nicholas, see my point about Karthik and I working on this together. Working on something together means that multiple perspectives Karthik (mostly focuses on YARN/MR) and myself (mostly focused on Common/HDFS) were incorporated.

          The point of this jira is not to pick one person to define compatibility, most of the stuff covered in this document are rules and guidelines that have existed in the project for years, this is just writing them up.

          Do have an objection to anything particular in the document?

          Show
          Eli Collins added a comment - Nicholas, see my point about Karthik and I working on this together . Working on something together means that multiple perspectives Karthik (mostly focuses on YARN/MR) and myself (mostly focused on Common/HDFS) were incorporated. The point of this jira is not to pick one person to define compatibility, most of the stuff covered in this document are rules and guidelines that have existed in the project for years, this is just writing them up. Do have an objection to anything particular in the document?
          Hide
          Karthik Kambatla added a comment -

          Hey Tsz Wo Nicholas Sze, thanks for chipping in. Your valuable feedback and insight will surely help ironing out the kinks in the proposal.

          I am eager to hear more feedback on the patch, would gladly incorporate additions/changes. Meanwhile, Eli Collins and I are working on a strawman policy for items we currently don't have policies for ("Currently, there is NO policy") and will post it as soon as we get to a reasonable shape.

          Show
          Karthik Kambatla added a comment - Hey Tsz Wo Nicholas Sze , thanks for chipping in. Your valuable feedback and insight will surely help ironing out the kinks in the proposal. I am eager to hear more feedback on the patch, would gladly incorporate additions/changes. Meanwhile, Eli Collins and I are working on a strawman policy for items we currently don't have policies for ("Currently, there is NO policy") and will post it as soon as we get to a reasonable shape.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > ... Karthik, is an active contributor to MR and YARN (why you haven't bumped into him yet), ...

          I do realize that Karthik have worked on MR and YARN but not much in HDFS. That's why I think that he may not be the best person to define HDFS compatibility, which is very important to HDFS. Would you agree?

          Show
          Tsz Wo Nicholas Sze added a comment - > ... Karthik, is an active contributor to MR and YARN (why you haven't bumped into him yet), ... I do realize that Karthik have worked on MR and YARN but not much in HDFS. That's why I think that he may not be the best person to define HDFS compatibility, which is very important to HDFS. Would you agree?
          Hide
          Eli Collins added a comment - - edited

          Hey Nicholas, per the thread on common-dev@ this is something Karthik and I worked on together. Karthik, is an active contributor to MR and YARN (why you haven't bumped into him yet), and, in my opinion, has sufficient experience to contribute on important issues. Newcomers are capable of working on important and impactful things (just consider the people you've been working with on HDFS snapshots).

          We would love your constructive feedback on the draft posted if you have any.

          Show
          Eli Collins added a comment - - edited Hey Nicholas, per the thread on common-dev@ this is something Karthik and I worked on together. Karthik, is an active contributor to MR and YARN (why you haven't bumped into him yet), and, in my opinion, has sufficient experience to contribute on important issues. Newcomers are capable of working on important and impactful things (just consider the people you've been working with on HDFS snapshots). We would love your constructive feedback on the draft posted if you have any.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Hi Karthik,

          Generally, Hadoop welcomes new comers. We like to help them to understand the code and contribute to the project. New contributors are better starting with some easy issues or issues with less impact.

          I think this JIRA "Define Hadoop Compatibility" is a very important issue and it is more suitable for some committers who has a deep understanding of Hadoop to work on it. Would you agree?

          Please forgive me if I have overlooked anything.

          Show
          Tsz Wo Nicholas Sze added a comment - Hi Karthik, Generally, Hadoop welcomes new comers. We like to help them to understand the code and contribute to the project. New contributors are better starting with some easy issues or issues with less impact. I think this JIRA "Define Hadoop Compatibility" is a very important issue and it is more suitable for some committers who has a deep understanding of Hadoop to work on it. Would you agree? Please forgive me if I have overlooked anything.
          Karthik Kambatla made changes -
          Attachment hadoop-9517.patch [ 12582340 ]
          Hide
          Karthik Kambatla added a comment -

          Removed HttpFs as a REST API.

          Show
          Karthik Kambatla added a comment - Removed HttpFs as a REST API.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          HttpFS implements WebHDFS. There is no HttpFS REST API. Do you agree? If yes, please remove HttpFS from the REST API section. Also, the HttpFS doc is not a REST API specification.

          Show
          Tsz Wo Nicholas Sze added a comment - HttpFS implements WebHDFS. There is no HttpFS REST API. Do you agree? If yes, please remove HttpFS from the REST API section. Also, the HttpFS doc is not a REST API specification.
          Hide
          Karthik Kambatla added a comment -

          Update captures HDFS policy of data migration as suggested by Steve.

          Show
          Karthik Kambatla added a comment - Update captures HDFS policy of data migration as suggested by Steve.
          Karthik Kambatla made changes -
          Attachment hadoop-9517.patch [ 12582336 ]
          Hide
          Karthik Kambatla added a comment -

          Good point, Steve. Let me add that and update the patch here.

          Show
          Karthik Kambatla added a comment - Good point, Steve. Let me add that and update the patch here.
          Hide
          Steve Loughran added a comment -

          This looks pretty good - I like where you list out "no policy here" -which is both a warning and a sign of where work may be needed at some point in the future.

          I'd flag on the Currently, there is NO policy on preserving data formats across releases. as while that may be the case,; HDFS does have a policy of handling migration from at least one major version behind. The format may change, but the data will be preserved. Can you add that to avoid scaring people?

          Show
          Steve Loughran added a comment - This looks pretty good - I like where you list out "no policy here" -which is both a warning and a sign of where work may be needed at some point in the future. I'd flag on the Currently, there is NO policy on preserving data formats across releases. as while that may be the case,; HDFS does have a policy of handling migration from at least one major version behind. The format may change, but the data will be preserved. Can you add that to avoid scaring people?
          Karthik Kambatla made changes -
          Link This issue incorporates HADOOP-9519 [ HADOOP-9519 ]
          Karthik Kambatla made changes -
          Link This issue incorporates HADOOP-9518 [ HADOOP-9518 ]
          Karthik Kambatla made changes -
          Link This issue relates to HADOOP-9542 [ HADOOP-9542 ]
          Karthik Kambatla made changes -
          Link This issue relates to HADOOP-9541 [ HADOOP-9541 ]
          Karthik Kambatla made changes -
          Attachment hadoop-9517.patch [ 12581553 ]
          Hide
          Karthik Kambatla added a comment -

          Thanks Steve and Nicholas. Uploading a new patch to incorporate your comments:

          1. Added section on semantics compatibility reflects Steve's comments on the mailing list and this JIRA.
          2. Updated REST APIs section: (1) separate out WebHDFS and HttpFS, (2) add links to all REST APIs as in the site, (3) remove JMX and conf from the list as they don't have a specification.

          I guess we should create JIRAs to define specification of conf and JMX servlets.

          Show
          Karthik Kambatla added a comment - Thanks Steve and Nicholas. Uploading a new patch to incorporate your comments: Added section on semantics compatibility reflects Steve's comments on the mailing list and this JIRA. Updated REST APIs section: (1) separate out WebHDFS and HttpFS, (2) add links to all REST APIs as in the site, (3) remove JMX and conf from the list as they don't have a specification. I guess we should create JIRAs to define specification of conf and JMX servlets.
          Hide
          Steve Loughran added a comment -

          The closest thing we do have to strict specifications are the tests, perhaps that could be used to declare stability

          "stable with respect to the current test suite"

          That makes clear that changing the test suite is a way of changing the specification: more tests = tighter spec. But for all those bits that aren't tightly specified, that's all we can say "works-for-our-tests"

          Show
          Steve Loughran added a comment - The closest thing we do have to strict specifications are the tests, perhaps that could be used to declare stability "stable with respect to the current test suite" That makes clear that changing the test suite is a way of changing the specification: more tests = tighter spec. But for all those bits that aren't tightly specified, that's all we can say "works-for-our-tests"
          Hide
          Tsz Wo Nicholas Sze added a comment -
          +  * WebHDFS (as supported by HttpFs) - Stable
          +
          +  * WebHDFS (as supported by HDFS) - Stable
          

          Since the section is talking about REST API but not implementations, the words "as supported by ..." should be omitted.

          Also, all REST APIs should link to the corresponding REST API specification. If some specifications are missing, I think it is better to remove them from the section for the moment. Without a specification, it is hard to talk about compatibility.

          Show
          Tsz Wo Nicholas Sze added a comment - + * WebHDFS (as supported by HttpFs) - Stable + + * WebHDFS (as supported by HDFS) - Stable Since the section is talking about REST API but not implementations, the words "as supported by ..." should be omitted. Also, all REST APIs should link to the corresponding REST API specification. If some specifications are missing, I think it is better to remove them from the section for the moment. Without a specification, it is hard to talk about compatibility.
          Hide
          Karthik Kambatla added a comment -

          Hi Steve

          Sorry, I should have explicitly noted "Semantic Compatibility" - I am waiting for the wiki to come back up so I can add those details. I ll definitely add "Semantic compatibility" as soon as the wiki comes back.

          Show
          Karthik Kambatla added a comment - Hi Steve Sorry, I should have explicitly noted "Semantic Compatibility" - I am waiting for the wiki to come back up so I can add those details. I ll definitely add "Semantic compatibility" as soon as the wiki comes back.
          Hide
          Steve Loughran added a comment -

          think the doc skipped that notion of "semantic compatibility". Even if we make guarantees that an interface doesn't change, what programs want is both a stable interface and stable semantics, which we can't guarantee.

          The doc implies that anything marked as @Stable is "required" to be compatible across versions, meaning no change to the behaviour of an API call can ever change. Which implies that you can't fix any behaviour that is considered a defect, and that all unintentional side effects could also be argued as features.

          Show
          Steve Loughran added a comment - think the doc skipped that notion of "semantic compatibility". Even if we make guarantees that an interface doesn't change, what programs want is both a stable interface and stable semantics, which we can't guarantee. The doc implies that anything marked as @Stable is "required" to be compatible across versions, meaning no change to the behaviour of an API call can ever change. Which implies that you can't fix any behaviour that is considered a defect, and that all unintentional side effects could also be argued as features.
          Karthik Kambatla made changes -
          Attachment hadoop-9517.patch [ 12581444 ]
          Hide
          Karthik Kambatla added a comment -

          Here is a preliminary patch which basically has:

          1. the content from wiki explaining the multiple categories that affect compatibility
          2. my understanding of the current policies for each of them
          3. have called out NO POLICY for cases where I couldn't find a policy for

          When you get a chance, please verify if my understanding of current policies is accurate.

          As the next step, I can propose policies for items without one.

          Show
          Karthik Kambatla added a comment - Here is a preliminary patch which basically has: the content from wiki explaining the multiple categories that affect compatibility my understanding of the current policies for each of them have called out NO POLICY for cases where I couldn't find a policy for When you get a chance, please verify if my understanding of current policies is accurate. As the next step, I can propose policies for items without one.
          Hide
          Karthik Kambatla added a comment -

          Creating a sub-task for every item seems like an overkill. I ll upload the entire doc, and for items that we think need deliberation, we can discuss here, in a sub-task, or in a DISCUSS thread. Will upload the patch with all items/policies as soon as I can.

          Show
          Karthik Kambatla added a comment - Creating a sub-task for every item seems like an overkill. I ll upload the entire doc, and for items that we think need deliberation, we can discuss here, in a sub-task, or in a DISCUSS thread. Will upload the patch with all items/policies as soon as I can.
          Hide
          Karthik Kambatla added a comment -

          Sure Vinod - do you suggest I create subtasks for each of the items?

          Show
          Karthik Kambatla added a comment - Sure Vinod - do you suggest I create subtasks for each of the items?
          Hide
          Vinod Kumar Vavilapalli added a comment -

          Can we do this as patches instead of a wiki page? I think it is easier to address comments that way..

          Show
          Vinod Kumar Vavilapalli added a comment - Can we do this as patches instead of a wiki page? I think it is easier to address comments that way..
          Karthik Kambatla made changes -
          Field Original Value New Value
          Assignee Karthik Kambatla [ kkambatl ]
          Hide
          Karthik Kambatla added a comment -

          Thanks for creating this JIRA. I spent sometime consolidating the community's discussions in JIRAs and mailing lists. Let me update the wiki with the policies as I understand, explicitly annotated (italicized) as my understanding. We can move the parts we have consensus on to regular font.

          Show
          Karthik Kambatla added a comment - Thanks for creating this JIRA. I spent sometime consolidating the community's discussions in JIRAs and mailing lists. Let me update the wiki with the policies as I understand, explicitly annotated (italicized) as my understanding. We can move the parts we have consensus on to regular font.
          Arun C Murthy created issue -

            People

            • Assignee:
              Karthik Kambatla
              Reporter:
              Arun C Murthy
            • Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development