|
This sounds fine.
The implication on documentation should be something like:
The marking for evolving could be as simple as adding "Evolving:" to the beginning of javadoc strings. Or perhaps we can add an annotation and doclet which adds "Evolving" in bold, as is done for deprecations. If evolving and/or deprecated features have Forrest documentation, then that too should include a marking. We'd also need to add end-user documentation of this policy, including exactly what "Evolving" and "Deprecated" mean. This should not include any discussion of the private classifications, only our compatibility promises for public stuff, and that anything undocumented has no compatibility promises. Does this make sense? I am +1 in the different scopes, and in favour of using Java 5 annotations to mark this up, rather than relying purely on package scoping to hide things. In the open source world, there's nothing to stop anyone putting their classes in your package to get at internal stuff, leaving you to field the support calls with your product name in all the stack traces.
I'd suggest that we have an @Stable marker to all interfaces that are considered stable; only things that are marked as stable are considered stable, everything else is inherently unstable/evolving One thing I would change, the Web GUI is an interface. if you can access it programmatically > only things that are marked as stable are considered stable, everything else is inherently unstable/evolving
That's easier for developers to maintain, but I think more confusing for users, who might reasonably assume that documented features will be supported going forward unless marked otherwise. Documentation should be maintainable, but not at the expense of clear description. So I still prefer marking APIs that are unstable and leaving stable APIs as the unmarked case. We can use tools like jdiff to help us audit releases for new features that ought to be marked as unstable. Annotations have the advantage that they can be included in the javadoc, providing a resource for developers.
To include an annotation in the javadoc, annotate the annotation with @documented and a runtime retention policy (this policy is required): // Example annotation that is included in javadoc
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface PublicScope {
public enum Scope {STABLE, EVOLVING};
Scope scope() default Scope.STABLE;
}
and then annotate the class normally: @PublicScope(scope=PublicScope.Scope.EVOLVING)
public class Tardis {
private int foo;
The generated Javadoc then includes the annotation, right below the direct known subclasses, and is hyperlinked to the annotation definition, which can include clarification of what the different values mean. Another nice Annotation feature is the ability to specify its target: package, class, field, etc. > Annotations have the advantage that they can be included in the javadoc, providing a resource for developers.
Annotations and/or taglets may be a great tools to help implement this. Perhaps we can limit generated documentation to only include things whose annotated scope is public. If we cannot, then public declarations that appear in the documentation that are not explicitly noted as evolving must be considered stable, since that's what users might reasonably assume.
As long as we supply the annotation with a default value (such as stable) and ensure that each class and package that should have an annotation do have an annotation, this should happen pretty much automatically. Had an offline talk with Sanjay and went over these issues.
One item that came up was the terminology. Scope may be a confusing term to use as it overloads a standard comp. sci term and may not be clear to new people approach our use of it. Since the goal of the scope classification is to indicate who should use the class, would audience be a more appropriate choice? Similarly with stability, it would probably be good to always refer to that as APIStability, lest our code gets tagged with "Unstable" all over and frightens away anyone looking it over to consider using Hadoop in production. Without proper context, stability and scope may not convey the appropriate meanings. I've been playing around with using annotations approach and in general they work nicely. Annotations such as (rough versions without proper documentation): package annotations; import java.lang.annotation.Documented; import java.lang.annotation.Inherited; import java.lang.annotation.Retention; import java.lang.annotation.RetentionPolicy; @Documented @Retention(RetentionPolicy.CLASS) @Inherited public @interface Audience { public enum AUDIENCE { /** * the interface is for general use by any application. */ Public, /** * the interface is used by a specified set of projects or systems (typically closely related projects). */ LimitedPrivate, /** * Only for use within application, not for use by general developers */ ProjectPrivate } AUDIENCE value(); } and package annotations; import java.lang.annotation.Documented; import java.lang.annotation.Inherited; @Inherited @Documented public @interface APIStability { public enum Stability { Stable, Evolving, Unstable }; Stability value(); } allows for reasonable, very clear documentation of code: package eccleston; import static annotations.Audience.AUDIENCE.Public; import annotations.APIStability; import static annotations.APIStability.Stability.Evolving; import static annotations.APIStability.Stability.Stable; import annotations.Audience; /** * Time and Relative Dimensions in Space */ @Audience(Public) @APIStability(Stable) public class TARDIS { public void materialize() { System.out.println("whoop! WHOop! WHOOP!"); } public void dematerialize() { System.out.println("WHOOP! WHOop! whoop!"); } @APIStability(Evolving) // We can mark individual methods as necessary public void blendIn() { System.out.println("Now I'm a police box! That'll work."); } } By using the @Documented meta-annotation, the audience and api-stability annotation appear in the javadoc. This is good: eccleston Class Tardis java.lang.Object eccleston.TARDIS Direct Known Subclasses: Type2TARDIS @Audience(value=Public) @APIStability(value=Stable) public class TARDIS extends java.lang.Object Time and Relative Dimensions in Space Unfortunately, there are two issues:
I had hoped we could just use package-info.java to mark a class' audience and api-stability and have it propagate all the way down the line in the documentation, but that doesn't work. In addition to issue one above, the @Inherited meta-annotation can only be applied to classes. As a partial remedy to these issues, we could write a documentation tool to check if new code is using any unstable/private apis and emit a warning, as part of test-patch or such. Taglets show up in the javadoc quite boldly, but I can't find a way to propagate them; we'd have to explicitly mark each section we wanted to tag (or write a tool to do it). Since we can't just mark a top level class and have it show up all down the line, having default values may not be a good idea after all. It's probably best to explicitly set the audience/api-stability value when the annotation is applied. So, the annotation approach may work but it's not as in-your-face as I was hoping for helping coders to immediately see which resources they should or shouldn't use in their development environments. I would like to propose that a particular subset of logging statements currently in Hadoop be made Public-Evolving, and eventually Public-Stable, for the purpose of lightweight tracing for studying Hadoop behavior. This has various applications, such as for diagnosing failures and visualizing behavior, and using log statements that are already being currently generated gives us a nice, low-overhead way for doing this.
We have been using Hadoop logs for diagnosis for some time now, and while the log statements have been relatively stable, we have had to make minor parser modifications between Hadoop versions, and there is currently no guarantee that the statements we find useful will be here to stay. It would be nice if we can articulate some of the existing log statements that help us in diagnosis, and if those statements can be made part of a public interface. Then, parsing and analysis tools can be produced based on this stable interface. This also relates to Chukwa, as far as us implementing our log-based analysis and diagnosis tools as part of the Chukwa project. @Jiaqi
-that's a a separate problem, which is more logs ought to be designed for machine post-mortems, and hence marked as "please don't change". Which is something for a new bugrep. (and yes, I agree with you, stable log output is good) @Steve
I actually commented about this issue of logs here because logs are mentioned here as part of the interface definition, so I was hoping to get "debugging+profiling log statements" marked as part of the public interface that will eventually become stable. Sure, the actual statements to be marked should be part of a separate JIRA, but I wanted to get this defined in the interface as well. Most of org.apache.hadoop.security should be
Limited-Private Evolving
We are considering moving to using Jaas more extensively; if we do that some of these interfaces (esp UserGroupInformation) will change. Did some work creating a test implementation of the above. Attached as a patch as a prototype of what such a system could look like, along with a screenshot of how the annotations end up looking in the Javadoc. Just a proof of concept, don't ding me over file locations and such.
It turned out to be better to have separate annotations for each of the possible labels: IDE integration was much better and the values show up more clearly in the Javadoc and IDE integreated Javadoc viewer. The biggest problem is that with the project split, these annotations are not carried over and we still have the fact that @Inherited and @Documented don't actually work correctly. So, for instance, even though I annotated FileSystem, none of that carries over to DistributedFileSystem, since it's in a different jar. Therefore, there would be a significant labeling initial and maintenance effort to make sure everything was marked as it should be. Anyway, this is what the above scheme could look like if implemented. Doug said:
>The implication on documentation should be something like: > * Public/stable should be documented. > * Public/evolving should documented, but must be well-marked as evolving. > * Private, limited or otherwise, should not be included in public documentation, but only in internal documents. Agreed, However private/stable and private/evolving also need to be document even though they If all methods of a subclass are not explicitly documented, I worry that it will become difficult to program against the code directly (without looking at generated javadoc). For example, I imagine needing to trace a method's ancestors or class hierarchy to determine its classification. Would it be too difficult or bloated to add public/stable (etc) to every applicable method?
> It turned out to be better to have separate annotations for each of the possible labels: IDE integration was much better and the values show up more clearly in the Javadoc and IDE integreated Javadoc viewer.
Hi Jakob, how are the separated annotations better than a single annotation with a parameter? c5073_20090825.patch: I tried a little example. It seems that a single annotation works well.
With a single annotation and enums, the text added to an element looks like this: @Audience(Audience.Type.PUBLIC)
public class HelloWorld {
compared to separate annotations, which end up looking like: @Audience_Public
public class HelloWorld {
I consider the latter code easier to grok and visually cleaner than the former. The word Audience is not repeated, the unnecessary word Type is not included, the odd capitalization of the Public enum is not included. Overall, the appearance is that of information meant for humans rather than computers. It's possible to create a representation similar to the first example with a single annotation, as with the above TARDIS class example, or continuing with the current example: @Audience(PUBLIC)
public class HelloWorld {
However, this requires static imports of the Type enum, which my Eclipse (Ganymede) had trouble doing in a friendly manner. I ended up having to manually insert them into each file and even then ran into weird auto-complete behavior. Eclipse seems to not have a good understanding of how to deal with static imports. This is what I meant by IDE integration. Also, it was faster to type @A <ctrl-space>P<ctrl-space) and be done with it than it was to deal with separate enums. Keystrokes add up. As far as the generated javadoc, I think it's a wash. Both approaches are equally readable. All that being said, this is a minor point and I'm not going to spend many of my lobbying points on it. @Bill
Absolutely, it may be a problem. That's why I consider the implementation of @Documented and @Inherited that doesn't actually document the inherited annotations broken. I don't know what the right answer is to this: if we require every method to be thusly documented, maintaining this metainfo becomes a burden and will be ignored by developers. If we don't, end users may end up in the situation Bill describes. At the very least, we should have tooling to either apply/update the documentation, or to warn users if they are using APIs that they probably shouldn't be. I don't believe our jars even ship with the javadoc included. Another approach to take is to have nested interface as follows:
public @interface Audience {
public @interface Public {
}
public @interface Private {
}
}
Code can be annotated then by using @Audience.Public, @Audience.Private This nicely groups all the annotations together under Audience, without needing to use the Audience_ prefix for grouping them together.
+1. Looks like the best of both worlds. Let's go with that. > Another approach to take is to have nested interface as follows ...
This is very creative. However, it does not seem that the nested annotation is designed for this purpose. In java.lang.annotation, this is an @Retention Typo: "this is an" => "there is an"
While I agree with Nicholas that the value approach is more correct. I think that Suresh's approach leads to a more readable annotation. It also let's us do things like:
public @interface Audience {
public @interface Public {}
public @interface LimitedPrivate {
String[] target;
}
public @interface Private {}
}
where limited private things have to list their targets. This works fine. The javadoc is as readable as any of the others (attached image).
@Documented @Inherited public @interface Audience { @Documented public @interface Public {} @Documented public @interface LimitedPrivate { String target(); } @Documented public @interface Private {} } I don't believe this is an abuse of Annotations, rather it's a creative use of Java's rather cumbersome implementation. +1 I think the target is better expressed as a String[] rather than a single string.
That actually ends up working quite nicely because, surprisingly, Java takes array literals for this: @Audience.Public public class SampleClass { @Audience.LimitedPrivate(target = {"Pig", "Owl", "Zebra"}) public class Foo {} @Audience.Private public class Bar {} } Again, the Javadoc is good (see attachment). It is hard to get nested annotations correct. I still think that Public, LimitedPrivate and Private are values of Audience.
If we are going to do nested annotations, I suggest to followings:
public final class Audience { private Audience() {} @Documented @Inherited public @interface Public {} @Documented @Inherited public @interface LimitedPrivate { String[] value(); } @Documented @Inherited public @interface Private {} } Yes, in the actual patch we would have to repeat Documented and Inherited on each. That doesn't seem to me to be a problem. My only concern with using value is that a term like target is more descriptive of what we're describing - who is the target of this LimitedPrivate section. But again, it's not a big point.
As for using Audience as a class or annotation, I don't think class is the correct way to go, but I'm not going to argue the point any more. There is a problem of using multiple classes (either nested or non-nested classes). It allows multiple annotations for a single entry, e.g.
@Audience.Public
@Audience.Private
public class HelloWorld {
}
It is even worst if the annotation is @Inherited because there is no way to remove the inherited annotations in the sub-classes. e.g. @Audience.Public public class Base { } @Audience.Private public class Sub extends Base { } Sub indeed is annotated with both @Audience.Public and @Audience.Private if the annotations are @Inherited and there is no way to remove @Audience.Public from Sub. In my opinion, we either need to:
1. Remove @Inherited 2. Use the parameters to specify them. Clearly, Nicholas would like 2. Personally, I think we should do 1. Having the annotations flow down is problematic if it flows from public interfaces into the private implementations.... > Clearly, Nicholas would like 2. Personally, I think we should do 1. Having the annotations flow down is problematic if it flows from public interfaces into the private implementations....
Yes, I prefer 2 but I am open to 1. BTW, should the value (or target) of LimitedPrivate be enum? Otherwise, someone may incorrectly annotate something with typos like the following: @Audience.LimitedPrivate({"Pi", "Owl", "Zebra"}) //Oops, it is supposed to be used by the Pig project,
//not the Pi programs.
public class HelloWorld {
}
+1 for jakob's last prposal with @inherited removed. A Audience.Public should not imply the same for its subclasses.
Also I like Nicholas's enum idea for the LimitedPrivate. Thanks for driving this through the many iterations. Implemented the current state of agreement, incorporating some of Nicholas' comments.
The current form ends up looking something like this: @Audience.LimitedPrivate({Project.HDFS, Project.AVRO})
@ApiStability(Stability.Evolving)
public abstract class Foo extends Configured implements Closeable {
The ApiStability annotation seems to have flown under the radar in terms of debates of naming, etc. Using the Project enum puts us back to having to a rather bulky syntax, but at this point, I'd just as assume have people do static imports if they really want it to look clean. I'd rather be more symmetric in our terminology. How about:
@InterfaceAudience.LimitedPrivate({Project.HDFS, Project.AVRO})
@InterfaceStability.Evolving
public class Foo ...
+1 This has converged to a solid solution.
No immediate comments, so marking patch available.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12418982/COMMON-5073.patch against trunk revision 812652. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/25/testReport/ This message is automatically generated. Could you please add more description to InterfaceStability, given that this is will be used for annotating the entire code base?
+1. Thanks for addressing the comment.
I comment the change. Thanks Jakob.
Integrated in Hadoop-Common-trunk-Commit #23 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/23/
. Add annotation mechanism for interface classification. Contributed by Jakob Homan. Integrated in Hadoop-Common-trunk #90 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/90/
. Add annotation mechanism for interface classification. Contributed by Jakob Homan. Mapred/HDFS still cannot use this, as it doesn't look like the common jar after this patch is committed to the sub-projects. Any plans for committing the common jar to mapreduce/hdfs rightaway? Or wait till some other patch does this?
Proposed Classification for Hadoop Interfaces (updated)
Editorial pass over all release notes prior to publication of 0.21.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Interface Taxonomy - Scope & Stability Classification
The interface taxonomy classification provided here is for guidance to developers and users of interfaces.
The classification guides a developer to declare the scope (or targeted audience or users) of an interface and also its stability.
This classification was derived from a taxonomy used inside Yahoo and
)
from the OpenSolaris taxonomy (http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice
Interface have two main attributes: Scope and Stability
For example many interfaces are merely internal or private interfaces of the implementation while others are public or external interfaces that applications or clients are expected to use. In posix, libc is an is an external or public interface, while large parts of the kernel are internal or private interfaces. In addition, some interfaces are targeted to some specific other subsystems. Identifying the scope helps define the customers or users of the interfaces and helps define the impact of breaking an interface. For example we may be willing to break the comaptibility of an interface whose scope is a small number of specific subsystems. One the other hand, one is unlikely to break a protocol interfaces that millions of internet users depend on.
The following are useful scopes in order of increasing/wider visibility
FAQ
While a private interface marked as stable is targeted to change only at major releases, it may break at other times if the providers of that interface are willing to changes the internal users of that interface. Further, a public stable interface is less likely to break even at major releases (even though it is allowed to break compatibility) because the impact of the change is larger. If you use a private interface (regardless of its stability) you run the risk of incompatibility.
Proposed Classification for Hadoop Interfaces