Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-556

Refactor Hadoop package structure and source tree.

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This Jira proposes refactoring the Hadoop package structure and source tree

      Goals
      1. A little finer package structure.

      • Current structure is a little flat
      • Smaller files (name node and data node are way too big)

      2. The client interfaces and data types sent across the wire should be clearly identifiable by the package they sit in. This will help preserving app compatibility since it will be very obvious when one breaks
      the interface.
      3. Split dfs's client and server side jars.
      4. Move map-reduce into separate src tree (but same SVN repository) along with its separate jar.
      5. The Javadoc for users of Hadoop should not contain the internal server-side interfaces/classes
      6. Fix all compiler warnings
      7. Fix/minimize findbug warnings

      The top level package structure remains unchanged:
      hadoop.fs
      hadoop.dfs
      hadoop.mapred
      Etc.

      Considered changing hadoop.dfs to hadoop.hdfs but the "h" does not really add much since hadoop is already part of the package name; didn't seem worth going to
      through the trouble of breaking compatibility.

      Changes will occur internally within the above packages.
      sub-Jira HADOOP-2885 proposes restructuring hadoop.dfs.

      Other Jiras will be filed for restructuring other parts.

        Activity

        Hide
        Doug Cutting added a comment -

        It would be more consistent to move the dfs package to org.apache.hadoop.fs.hdfs, and to rename the DistributedFileSystem class to be HDFS. There should be few compatibility issues with this, since applications should not refer directly to hdfs classes. If needed, we could possibly create a org.apache.hadoop.dfs.DistributedFileSystem subclass of org.apache.hadoop.fs.hdfs.HDFS for one release.

        The src/java directory would better be split not in two, but in three: src/java/

        {core,mapred,hdfs}

        . Splitting HDFS into its own tree will help keep the many internal APIs made public by this restructuring from appearing in end-user javadocs, and also better reflect system layering.

        Fixing compiler & findbugs warnings seems like mission creep. Wouldn't those best be in separate issues, not even sub-issues of this?

        Show
        Doug Cutting added a comment - It would be more consistent to move the dfs package to org.apache.hadoop.fs.hdfs, and to rename the DistributedFileSystem class to be HDFS. There should be few compatibility issues with this, since applications should not refer directly to hdfs classes. If needed, we could possibly create a org.apache.hadoop.dfs.DistributedFileSystem subclass of org.apache.hadoop.fs.hdfs.HDFS for one release. The src/java directory would better be split not in two, but in three: src/java/ {core,mapred,hdfs} . Splitting HDFS into its own tree will help keep the many internal APIs made public by this restructuring from appearing in end-user javadocs, and also better reflect system layering. Fixing compiler & findbugs warnings seems like mission creep. Wouldn't those best be in separate issues, not even sub-issues of this?
        Hide
        Edward J. Yoon added a comment -

        This is a really cool refactoring.

        Show
        Edward J. Yoon added a comment - This is a really cool refactoring.

          People

          • Assignee:
            Sanjay Radia
            Reporter:
            Sanjay Radia
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development