Hadoop Common
  1. Hadoop Common
  2. HADOOP-7035

Document incompatible API changes between releases

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Blocker Blocker
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 0.22.1
    • Component/s: documentation
    • Labels:
      None

      Issue Links

        Activity

        Hide
        Konstantin Shvachko added a comment -

        Got it.

        > The script already uses the annotations to restrict the changes to the public API.

        Why does it show protected methods createDataFileReader()?

        > Yes, including it in the release notes would be a good start.

        Will you do it for 0.22?

        Show
        Konstantin Shvachko added a comment - Got it. > The script already uses the annotations to restrict the changes to the public API. Why does it show protected methods createDataFileReader()? > Yes, including it in the release notes would be a good start. Will you do it for 0.22?
        Hide
        Tom White added a comment -

        > 1. Is it possible that the tool needs to be applied the other way around, that is having 0.22 as the base and Tested version being 0.21?

        The method signature changed, which is reported as the method being removed. In 0.21 it was

        protected SequenceFile.Reader createDataFileReader(FileSystem fs,
            Path dataFile, Configuration conf)
        

        And in 0.22 it is

        protected SequenceFile.Reader 
            createDataFileReader(Path dataFile, Configuration conf,
                                 SequenceFile.Reader.Option... options)
        

        > 2. Did you run the tool against MR only? Hard to believe there were no API changes in HDFS and common.

        I ran it against all three. HDFS is marked as @Private, so it won't show up in the report.

        > 3. What is the final goal of this jira. Is it to identify incompatible changes and make a patch for site with the release notes?

        Yes, including it in the release notes would be a good start.

        > If so we can filter out non public changes from the reports generated by SigTest and probably those that do not belong to public APIs in turns of Hadoop annotations, if it makes sense.

        The script already uses the annotations to restrict the changes to the public API.

        Show
        Tom White added a comment - > 1. Is it possible that the tool needs to be applied the other way around, that is having 0.22 as the base and Tested version being 0.21? The method signature changed, which is reported as the method being removed. In 0.21 it was protected SequenceFile.Reader createDataFileReader(FileSystem fs, Path dataFile, Configuration conf) And in 0.22 it is protected SequenceFile.Reader createDataFileReader(Path dataFile, Configuration conf, SequenceFile.Reader.Option... options) > 2. Did you run the tool against MR only? Hard to believe there were no API changes in HDFS and common. I ran it against all three. HDFS is marked as @Private, so it won't show up in the report. > 3. What is the final goal of this jira. Is it to identify incompatible changes and make a patch for site with the release notes? Yes, including it in the release notes would be a good start. > If so we can filter out non public changes from the reports generated by SigTest and probably those that do not belong to public APIs in turns of Hadoop annotations, if it makes sense. The script already uses the annotations to restrict the changes to the public API.
        Hide
        Konstantin Shvachko added a comment -

        Sounds like a good approach to me. I see SigTest does some thorough analysis of the APIs compared to JDiff.

        1. Is it possible that the tool needs to be applied the other way around, that is having 0.22 as the base and Tested version being 0.21?
          A was looking into the history of Class org.apache.hadoop.io.MapFile$Reader. The tool says
            "E1.2 - API type removed" : method protected org.apache.hadoop.io.SequenceFile$Reader org.apache.hadoop.io.MapFile$Reader.createDataFileReader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration)
          

          But its seems createDataFileReader() has been added in 0.22. Or am I missing something?

        2. Did you run the tool against MR only? Hard to believe there were no API changes in HDFS and common.
        3. What is the final goal of this jira. Is it to identify incompatible changes and make a patch for site with the release notes?
          If so we can filter out non public changes from the reports generated by SigTest and probably those that do not belong to public APIs in turns of Hadoop annotations, if it makes sense.
        4. I think for 0.22 we can restrict ourselves to comparing with 0.21 only. Or do people think we need comparisons with 0.20.* (how many of them?)?
        Show
        Konstantin Shvachko added a comment - Sounds like a good approach to me. I see SigTest does some thorough analysis of the APIs compared to JDiff. Is it possible that the tool needs to be applied the other way around, that is having 0.22 as the base and Tested version being 0.21? A was looking into the history of Class org.apache.hadoop.io.MapFile$Reader. The tool says "E1.2 - API type removed" : method protected org.apache.hadoop.io.SequenceFile$Reader org.apache.hadoop.io.MapFile$Reader.createDataFileReader(org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path,org.apache.hadoop.conf.Configuration) But its seems createDataFileReader() has been added in 0.22. Or am I missing something? Did you run the tool against MR only? Hard to believe there were no API changes in HDFS and common. What is the final goal of this jira. Is it to identify incompatible changes and make a patch for site with the release notes? If so we can filter out non public changes from the reports generated by SigTest and probably those that do not belong to public APIs in turns of Hadoop annotations, if it makes sense. I think for 0.22 we can restrict ourselves to comparing with 0.21 only. Or do people think we need comparisons with 0.20.* (how many of them?)?
        Hide
        Tom White added a comment -

        I had a look at more compatibility testing tools, and found SigTest (http://sigtest.java.net/), which is used by OpenJDK and NetBeans to check for incompatible changes. It is very comprehensive and can check binary and source compatibility.

        I've written some scripts that allow two Hadoop tarballs to be compared for incompatible API changes. The scripts are at https://github.com/tomwhite/hadoop-compatibility-tools.

        I've attached a couple of generated reports for the differences between 0.20.203 and 0.20.204, and between 0.21.0 and 0.22.0 (an old snapshot). It would be good to generate the difference with 0.23.0 too, but that requires HADOOP-7642. The reports can help highlight unwanted compatibilities and give us a chance to fix them before release.

        It would be nice to add such checks to patch testing so that incompatible changes are flagged, but this can be tackled separately.

        Show
        Tom White added a comment - I had a look at more compatibility testing tools, and found SigTest ( http://sigtest.java.net/ ), which is used by OpenJDK and NetBeans to check for incompatible changes. It is very comprehensive and can check binary and source compatibility. I've written some scripts that allow two Hadoop tarballs to be compared for incompatible API changes. The scripts are at https://github.com/tomwhite/hadoop-compatibility-tools . I've attached a couple of generated reports for the differences between 0.20.203 and 0.20.204, and between 0.21.0 and 0.22.0 (an old snapshot). It would be good to generate the difference with 0.23.0 too, but that requires HADOOP-7642 . The reports can help highlight unwanted compatibilities and give us a chance to fix them before release. It would be nice to add such checks to patch testing so that incompatible changes are flagged, but this can be tackled separately.
        Hide
        Nigel Daley added a comment -

        +1 for 0.22. Marking blocker so it doesn't fall off the list.

        Show
        Nigel Daley added a comment - +1 for 0.22. Marking blocker so it doesn't fall off the list.
        Hide
        Tom White added a comment -

        Once HADOOP-7106 is in the scripts from this issue can be checked in alongside the three subprojects.

        Show
        Tom White added a comment - Once HADOOP-7106 is in the scripts from this issue can be checked in alongside the three subprojects.
        Hide
        Tom White added a comment -

        > I noticed it classifies a few changes as "incompatible" which I'd have thought are compatible

        It sometimes produces false positives (changes that are marked as incompatible, but in fact aren't), which is what these particular changes are. We might be able to improve the code to handle these cases.

        > Would it be possible to also distinguish between "source-compatible" and "binary-compatible" somehow?

        This tool is for detecting source incompatible changes. For testing binary-compatibility we need another tool, like Clirr (although I'm not sure how up to date this one is).

        Show
        Tom White added a comment - > I noticed it classifies a few changes as "incompatible" which I'd have thought are compatible It sometimes produces false positives (changes that are marked as incompatible, but in fact aren't), which is what these particular changes are. We might be able to improve the code to handle these cases. > Would it be possible to also distinguish between "source-compatible" and "binary-compatible" somehow? This tool is for detecting source incompatible changes. For testing binary-compatibility we need another tool, like Clirr (although I'm not sure how up to date this one is).
        Hide
        Todd Lipcon added a comment -

        Tom, this is really cool stuff!

        I noticed it classifies a few changes as "incompatible" which I'd have thought are compatible:

        • protected -> public accessibility
        • adding an interface to a public class
          Am I missing something?

        Would it be possible to also distinguish between "source-compatible" and "binary-compatible" somehow? Some changes (like changing a parameter to a superclass of the old type) won't work without a recompile of the caller, but don't require any code change. Whereas others (like changing a parameter to an entirely unrelated type) obviously do.

        Of course these are suggestions for improvements - as is it's already really useful.

        Show
        Todd Lipcon added a comment - Tom, this is really cool stuff! I noticed it classifies a few changes as "incompatible" which I'd have thought are compatible: protected -> public accessibility adding an interface to a public class Am I missing something? Would it be possible to also distinguish between "source-compatible" and "binary-compatible" somehow? Some changes (like changing a parameter to a superclass of the old type) won't work without a recompile of the caller, but don't require any code change. Whereas others (like changing a parameter to an entirely unrelated type) obviously do. Of course these are suggestions for improvements - as is it's already really useful.
        Hide
        Tom White added a comment -

        Alan, I've updated the docs at http://people.apache.org/~tomwhite/HADOOP-7035/ to include all, stable-incompatible, evolving-incompatible, and unstable-incompatible changes. I agree that we should publish something like this with releases.

        Doug, This is harder to do since we have to backport the audience/stability annotations, and work around the project split. I did something like this for 0.21 at http://people.apache.org/~tomwhite/HADOOP-6668/docs/jdiff/changes.html

        Show
        Tom White added a comment - Alan, I've updated the docs at http://people.apache.org/~tomwhite/HADOOP-7035/ to include all, stable-incompatible, evolving-incompatible, and unstable-incompatible changes. I agree that we should publish something like this with releases. Doug, This is harder to do since we have to backport the audience/stability annotations, and work around the project split. I did something like this for 0.21 at http://people.apache.org/~tomwhite/HADOOP-6668/docs/jdiff/changes.html
        Hide
        Doug Cutting added a comment -

        Should we also compare 0.22 to 0.20, since that's a likely upgrade path for many?

        Show
        Doug Cutting added a comment - Should we also compare 0.22 to 0.20, since that's a likely upgrade path for many?
        Hide
        Alan Gates added a comment -

        As an extensive user of Hadoop APIs having a list like this as part of the release would be nice, so I think incorporating it in the release process is good. It would be even nicer if it listed changes to evolving interfaces in a separate section, as some of us like to live dangerously.

        Show
        Alan Gates added a comment - As an extensive user of Hadoop APIs having a list like this as part of the release would be nice, so I think incorporating it in the release process is good. It would be even nicer if it listed changes to evolving interfaces in a separate section, as some of us like to live dangerously.
        Hide
        Tom White added a comment -

        Here's a script to generate the incompatible API changes between 0.21.0 and the 0.22 branch. It ignores elements that are marked as @LimitedPrivate or @Private, and those that are marked as @Evolving or @Unstable. Furthermore, it uses a modified version of JDiff that only highlights incompatible changes (see patch at http://sourceforge.net/tracker/?func=detail&aid=2990626&group_id=37160&atid=419055).

        The resulting output can be seen at http://people.apache.org/~tomwhite/HADOOP-7035/common/docs/jdiff/changes.html and http://people.apache.org/~tomwhite/HADOOP-7035/mapreduce/docs/jdiff/changes.html. (There's no HDFS output because its API is private.)

        Should we incorporate this into the release process (so that all changes are accounted for)? Or have Hudson run it to detect incompatible changes introduced on a per-patch basis?

        Show
        Tom White added a comment - Here's a script to generate the incompatible API changes between 0.21.0 and the 0.22 branch. It ignores elements that are marked as @LimitedPrivate or @Private, and those that are marked as @Evolving or @Unstable. Furthermore, it uses a modified version of JDiff that only highlights incompatible changes (see patch at http://sourceforge.net/tracker/?func=detail&aid=2990626&group_id=37160&atid=419055 ). The resulting output can be seen at http://people.apache.org/~tomwhite/HADOOP-7035/common/docs/jdiff/changes.html and http://people.apache.org/~tomwhite/HADOOP-7035/mapreduce/docs/jdiff/changes.html . (There's no HDFS output because its API is private.) Should we incorporate this into the release process (so that all changes are accounted for)? Or have Hudson run it to detect incompatible changes introduced on a per-patch basis?
        Hide
        Tom White added a comment -

        Good point - thanks for re-wording it!

        Show
        Tom White added a comment - Good point - thanks for re-wording it!
        Hide
        Greg Roelofs added a comment -

        The title was fairly misleading. (I actually thought it was a sarcastic non-bug...)

        Show
        Greg Roelofs added a comment - The title was fairly misleading. (I actually thought it was a sarcastic non-bug...)

          People

          • Assignee:
            Tom White
            Reporter:
            Tom White
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:

              Development