Issue Details (XML | Word | Printable)

Key: HADOOP-3750
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Sharad Agarwal
Reporter: Tom White
Votes: 0
Watchers: 7
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

Fix and enforce module dependencies

Created: 11/Jul/08 12:02 PM   Updated: 08/Jul/09 04:52 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.20.0

Time Tracking:
Issue & Sub-Tasks
Issue Only
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 3750_v1.patch 2008-11-18 03:38 PM Sharad Agarwal 5 kB
Text File Licensed for inclusion in ASF works 3750_v2.patch 2008-12-02 05:34 AM Sharad Agarwal 5 kB
Text File Licensed for inclusion in ASF works hadoop-3750.patch 2008-07-18 10:07 AM Tom White 8 kB
Issue Links:
Blocker
Reference
 

Hadoop Flags: Reviewed, Incompatible change
Release Note: Removed deprecated method parseArgs from org.apache.hadoop.fs.FileSystem.
Resolution Date: 04/Dec/08 10:23 AM

Sub-Tasks  All   Open   

 Description  « Hide
Since HADOOP-2916 Hadoop Core consists of modules in independent source trees: core, hdfs, mapred (and others). The dependencies between them should be enforced to avoid cyclic dependencies. At present they all have dependencies on each other.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Tom White added a comment - 11/Jul/08 12:07 PM
The simplest way to enforce this would be to compile core, hdfs and mapred independently (hdfs depends on core, mapred depends on core and hdfs).

Bill de hÓra has mapped the dependency cycles in the codebase using Structure101. We should consider using this for keeping an eye on further dependency problems.


Doug Cutting added a comment - 14/Jul/08 07:51 PM
> mapred depends on core and hdfs

No, mapred should only depend on core. Core should include the FileSystem API, which should be sufficient for mapreduce.


Tom White added a comment - 18/Jul/08 10:07 AM
Here's a patch which fixes the easier inter-module dependencies. There are a few left which require a bit more effort/discussion:

Core

  • FsShell depends on DistributedFileSystem to call setVerifyChecksum on it. Should we add setVerifyChecksum to FileSystem? If so we need to make ChecksumFileSystem honour it.
  • HarFileSystem uses LineRecordReader to read lines. Replace with own line-reading code.
  • ReflectionUtils depends on JobConf and JobConfigurable to set config. Not sure what to do here. Have a MapRed ReflectionUtils that does this extra configuration?

HDFS

  • DataNode, FSNamesystem, SecondaryNameNode all depends on StatusHttpServer which is in mapred. Split out common part and put in core.

MapRed

  • Task depends on DistributedFileSystem to update statistics for all known filesystems. Should we get this from the FileSystem cache?

Doug Cutting added a comment - 21/Jul/08 06:58 PM
> Should we add setVerifyChecksum to FileSystem?

+1

> Replace with own line-reading code.

Or move line-reading code to util?

> Have a MapRed ReflectionUtils that does this extra configuration?

+1

> Should we get this from the FileSystem cache?

It should use FileSystem#statisticsTable (which, incidentally, should be named STATISTICS_TABLE).

Perhaps we should add a FileSystem method like:

public static Map<String, Statistics> getStatistics();

That returns the stats indexed by URI scheme?


Tsz Wo (Nicholas), SZE added a comment - 23/Jul/08 07:14 PM
> DataNode, FSNamesystem, SecondaryNameNode all depends on StatusHttpServer which is in mapred. Split out common part and put in core.

+1

We should create a package in core for the base web server class and the general servlet classes (e.g. org.apache.hadoop.util.ServletUtil). Should we create a separated issue for this?


Tom White added a comment - 24/Jul/08 03:41 PM

Should we create a separated issue for this?

Yes. It's probably worth creating subtasks for the remaining fixes as they are all fairly large and unrelated to each other. When they've been fixed we can make the change to enforce the dependencies by compiling each module separately.


Tsz Wo (Nicholas), SZE added a comment - 24/Jul/08 06:04 PM
Created HADOOP-3824 as a sub-task for StatusHttpServer.

Tsz Wo (Nicholas), SZE added a comment - 26/Jul/08 02:48 AM
The latest patch in HADOOP-3824 is ready to be committed. See whether you would like to take a look.

Owen O'Malley added a comment - 21/Oct/08 06:32 AM
Actually, the dependency should just be:

hdfs depends on core
mapred depends on core


Sharad Agarwal added a comment - 18/Nov/08 03:38 PM
This patch:
  • FileSystem: removed the deprecated method
  • build.xml:
    split the compile-core-classes into separate compile-mapred-classes and compile-hdfs-classes to have proper dependency.
    (Ultimately when core, mapred, hdfs projects split, build.xml has to be split as well. Till then need to restrict the cyclic dependency code to creep in.)

Tom White added a comment - 25/Nov/08 07:57 PM
+1

Sharad Agarwal added a comment - 26/Nov/08 04:57 AM
I think this patch can go in and doesn't need to wait for HADOOP-4631. Checking this in early would restrict users to introduce any cyclic dependency in code.

Hadoop QA added a comment - 28/Nov/08 02:17 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12394174/3750_v1.patch
against trunk revision 720930.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/console

This message is automatically generated.


Sharad Agarwal added a comment - 02/Dec/08 05:34 AM
Correction in build.xml to include ${build.src} directory for compilation

Sharad Agarwal added a comment - 04/Dec/08 04:47 AM
all core and contrib tests passed on my local box.
output from test-patch:
[exec]     -1 overall.
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

Tom White added a comment - 04/Dec/08 10:23 AM
I've just committed this. Thanks Sharad!

Hudson added a comment - 06/Dec/08 02:02 PM