Issue Details (XML | Word | Printable)

Key: HADOOP-5073
Type: Sub-task Sub-task
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Jakob Homan
Reporter: Sanjay Radia
Votes: 0
Watchers: 25
Operations

If you were logged in you would be able to see more operations.
Hadoop Common
HADOOP-5064

Hadoop 1.0 Interface Classification - scope (visibility - public/private) and stability

Created: 16/Jan/09 08:59 PM   Updated: 03/Nov/09 11:40 PM
Return to search
Component/s: None
Affects Version/s: 0.21.0
Fix Version/s: 0.21.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works c5073_20090825.patch 2009-08-26 01:25 AM Tsz Wo (Nicholas), SZE 1 kB
Text File Licensed for inclusion in ASF works COMMON-5073.patch 2009-09-09 12:33 AM Jakob Homan 4 kB
Text File Licensed for inclusion in ASF works COMMON-5073.patch 2009-09-08 10:33 PM Jakob Homan 3 kB
Text File Licensed for inclusion in ASF works COMMON-5073.patch 2009-09-08 08:41 PM Jakob Homan 3 kB
Text File HADOOP-5073.patch 2009-08-14 10:41 PM Jakob Homan 6 kB
Image Attachments:

1. 5073_demo.png
(112 kB)

2. c5073_20090825.png
(35 kB)

3. Nested.png
(144 kB)

4. Picture 1.png
(27 kB)
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Release Note: Annotation mechanism enables interface classification.
Resolution Date: 09/Sep/09 12:45 AM


 Description  « Hide
This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias core-dev@hadoop.apache.org in Nov 2008.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Sanjay Radia made changes - 16/Jan/09 09:10 PM
Field Original Value New Value
Description This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias core-dev@hadoop.apache.org in Nov 2008.

h2. Interface Taxonomy - Scope & Stability Classification

The interface taxonomy classification provided here is for guidance to developers and users of interfaces.
The classification guides a developer to declare the scope (or targeted audience or users) of an interface and also its stability.
* *Benefits to the user of an interface*: Knows which interfaces to use or not use and their stability.
* *Benefits to the developer*: to prevent accidental changes of interfaces and hence accidental impact on users or other components or system. This is particularly useful in large systems with many developers who may not all have a shared state/history of the project.

This classification was derived from a taxonomy used inside Yahoo and
from the OpenSolaris taxonomy (http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)

Interface have two main attributes: *Scope* and *Stability*
* *Scope* - _denotes the potential customers of the interface_.
   For example many interfaces are merely internal or private interfaces of the implementation while others are public or external interfaces that applications or clients are expected to use. In posix, libc is an is an external or public interface, while large parts of the kernel are internal or private interfaces. In addition, some interfaces are targeted to some specific other subsystems. Identifying the scope helps define the customers or users of the interfaces and helps define the impact of breaking an interface. For example we may be willing to break the comaptibility of an interface whose scope is a small number of specific subsystems. One the other hand, one is unlikely to break a protocol interfaces that millions of internet users depend on.
  The following are useful scopes in order of increasing/wider visibility
** *project-private*
*** the interface is for internal use _within_ the project and should not be used by applications. It is subject to change at anytime without notice. Most interfaces of a project are project private.
** *limited-private*
*** the interface is used by a specified set of projects or systems (typically closely related projects). Other projects or systems should not use the interface. Changes to the interface will be communicated/negotiated with the specified projects. For example, in the hadoop project, some interfaces are *hdfs-mapReduce-private* in that they are private to the hdfs and mapReduce projects.
** *company-private* (*_This not applicable to opensource projects such as Hadoop._* It is mentioned here for completeness.)
*** the interface can use used by other projects within a company.
** *public*
*** the interface is for general use by any application.

* *Stability* - _denotes when changes can be made to the interface that break compatibility_.
** *Stable*
*** Can evolve while retaining compatibility for minor release boundaries.; can break compatibility only at major release (ie. at m.0).
** *Evolving*
*** Evolving, but can break compatibility at minor release (i.e. m.x)
** *Unstable*
*** This usually makes sense for only private interfaces.
*** However one may call this out for a _supposedly_ public interface to highlight that it should not be used as an interface; for public interfaces, labeling it as *Not-an-interface* is probably more appropriate than "unstable".
**** Examples of publically visible interfaces that are unstable (ie not-an-interface): GUI, CLIs whose output format will change
** *Deprecated* - should not be used, will be removed in the future.


h2. FAQ
# What is the harm in applications using a private interface that is stable? How is it different than a public stable interface?
   While a private interface marked as stable is targeted to change only at major releases, it may break at other times if the providers of that interface are willing to changes the internal users of that interface. Further, a public stable interface is less likely to break even at major releases (even though it is allowed to break compatibility) because the impact of the change is larger. *If you use a private interface (regardless of its stability) you run the risk of incompatibility*.
# Why bother declaring the stability of a private interface?
** To communicate the intent to its internal users.
** To provide guidelines to developers of the interface
** The stability may capture other internal properties of the system
*** e.g In HDFS, NN-DN protocol stability can help implement as rolling upgrades
*** e.g. In HDFS, FSImage stabilty can help provide more flexible roll backs.
# How will the classification be recorded for hadoop APIs?
** Each interface or class will have the scope and stability recorded using javadoc tags, annotation, or some other mechanim. What ever mechanism we choose, the classification must be visisble on the genrated java doc.
** APIs of private scope will not be part of the "public javadoc generated by ant (ie by the _ant target_ "javadoc"); they will only be generated for the developer javadoc (generated by _ant target_ "javadoc-dev")
** One can derive the scope of java classes and java interfaces by the scope of the package in which they are contained. Hence it is useful to declare the scope of each java package as public or private (along with the private scope variations).


h2. Proposed Classification for Hadoop Interfaces

* Scope Public
** Stable
*** FileSystem, MapReduce, Config, CLI (inlcuding output), parts of Mapred.lib, Job Logs API, instrumentation metrics. Audit logs
** Evolving
*** TFile, parts of Mapred.lib, some instrumentation metrics, jmx interface (till it becomes stable),
*** Job logs and job history ( Some tools, scripts and chukwa use this to analyze job processing)
** Not An interface
*** Web GUI
* Scope Private
** Limited-Private Evolving
*** RPC, Metrics (HDFS-MapReduce Private) - once stable, we can consider making these public-stable.
** Project-Private Stable
*** Intra-HDFS and MR protocols (facilitates rolling upgrades down the road)
*** FSImage
**** Note this will enable old versions of HDFS to read newer fsImage and hence enable more flexible roll backs.
**** Q. Should this be Project-Private Evolving instead?
**** Regardless of the stability of FSImage, new versions of HDFS have to be able to transparently convert older versions and provide roll-back.
** Project-Private Evolving
*** DFSClient (Q. should this be "project-private unstable"
** Project-Private Unstable
*** System logs
*** All implementation classes and interfaces not otherwise classified are considered to be project-private stable.

This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias core-dev@hadoop.apache.org in Nov 2008.

Jakob Homan made changes - 14/Aug/09 10:41 PM
Attachment 5073_demo.png [ 12416622 ]
Attachment HADOOP-5073.patch [ 12416621 ]
Tsz Wo (Nicholas), SZE made changes - 26/Aug/09 01:25 AM
Attachment c5073_20090825.png [ 12417689 ]
Attachment c5073_20090825.patch [ 12417688 ]
Jakob Homan made changes - 04/Sep/09 06:23 PM
Attachment Nested.png [ 12418652 ]
Jakob Homan made changes - 04/Sep/09 07:08 PM
Attachment Picture 1.png [ 12418656 ]
Jakob Homan made changes - 08/Sep/09 08:41 PM
Attachment COMMON-5073.patch [ 12418964 ]
Owen O'Malley made changes - 08/Sep/09 09:19 PM
Assignee Sanjay Radia [ sanjay.radia ] Jakob Homan [ jghoman ]
Jakob Homan made changes - 08/Sep/09 10:33 PM
Attachment COMMON-5073.patch [ 12418982 ]
Jakob Homan made changes - 08/Sep/09 11:27 PM
Status Open [ 1 ] Patch Available [ 10002 ]
Jakob Homan made changes - 09/Sep/09 12:33 AM
Attachment COMMON-5073.patch [ 12419001 ]
Suresh Srinivas made changes - 09/Sep/09 12:44 AM
Affects Version/s 0.21.0 [ 12313563 ]
Fix Version/s 0.21.0 [ 12313563 ]
Hadoop Flags [Reviewed]
Suresh Srinivas made changes - 09/Sep/09 12:45 AM
Resolution Fixed [ 1 ]
Status Patch Available [ 10002 ] Resolved [ 5 ]
Suresh Srinivas made changes - 09/Sep/09 12:46 AM
Release Note Adds annotation mechanism for interface classification.
Robert Chansler made changes - 09/Oct/09 03:57 AM
Release Note Adds annotation mechanism for interface classification. Annotation mechanism enables interface classification.
Suresh Srinivas made changes - 03/Nov/09 11:40 PM
Link This issue relates to HDFS-752 [ HDFS-752 ]
Suresh Srinivas made changes - 03/Nov/09 11:40 PM
Link This issue relates to HADOOP-6289 [ HADOOP-6289 ]