Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Removed deprecated method parseArgs from org.apache.hadoop.fs.FileSystem.

      Description

      Since HADOOP-2916 Hadoop Core consists of modules in independent source trees: core, hdfs, mapred (and others). The dependencies between them should be enforced to avoid cyclic dependencies. At present they all have dependencies on each other.

      1. 3750_v1.patch
        5 kB
        Sharad Agarwal
      2. 3750_v2.patch
        5 kB
        Sharad Agarwal
      3. hadoop-3750.patch
        8 kB
        Tom White

        Issue Links

          Activity

          Hide
          Tom White added a comment -

          The simplest way to enforce this would be to compile core, hdfs and mapred independently (hdfs depends on core, mapred depends on core and hdfs).

          Bill de hÓra has mapped the dependency cycles in the codebase using Structure101. We should consider using this for keeping an eye on further dependency problems.

          Show
          Tom White added a comment - The simplest way to enforce this would be to compile core, hdfs and mapred independently (hdfs depends on core, mapred depends on core and hdfs). Bill de hÓra has mapped the dependency cycles in the codebase using Structure101 . We should consider using this for keeping an eye on further dependency problems.
          Hide
          Doug Cutting added a comment -

          > mapred depends on core and hdfs

          No, mapred should only depend on core. Core should include the FileSystem API, which should be sufficient for mapreduce.

          Show
          Doug Cutting added a comment - > mapred depends on core and hdfs No, mapred should only depend on core. Core should include the FileSystem API, which should be sufficient for mapreduce.
          Hide
          Tom White added a comment -

          Here's a patch which fixes the easier inter-module dependencies. There are a few left which require a bit more effort/discussion:

          Core

          • FsShell depends on DistributedFileSystem to call setVerifyChecksum on it. Should we add setVerifyChecksum to FileSystem? If so we need to make ChecksumFileSystem honour it.
          • HarFileSystem uses LineRecordReader to read lines. Replace with own line-reading code.
          • ReflectionUtils depends on JobConf and JobConfigurable to set config. Not sure what to do here. Have a MapRed ReflectionUtils that does this extra configuration?

          HDFS

          • DataNode, FSNamesystem, SecondaryNameNode all depends on StatusHttpServer which is in mapred. Split out common part and put in core.

          MapRed

          • Task depends on DistributedFileSystem to update statistics for all known filesystems. Should we get this from the FileSystem cache?
          Show
          Tom White added a comment - Here's a patch which fixes the easier inter-module dependencies. There are a few left which require a bit more effort/discussion: Core FsShell depends on DistributedFileSystem to call setVerifyChecksum on it. Should we add setVerifyChecksum to FileSystem? If so we need to make ChecksumFileSystem honour it. HarFileSystem uses LineRecordReader to read lines. Replace with own line-reading code. ReflectionUtils depends on JobConf and JobConfigurable to set config. Not sure what to do here. Have a MapRed ReflectionUtils that does this extra configuration? HDFS DataNode, FSNamesystem, SecondaryNameNode all depends on StatusHttpServer which is in mapred. Split out common part and put in core. MapRed Task depends on DistributedFileSystem to update statistics for all known filesystems. Should we get this from the FileSystem cache?
          Hide
          Doug Cutting added a comment -

          > Should we add setVerifyChecksum to FileSystem?

          +1

          > Replace with own line-reading code.

          Or move line-reading code to util?

          > Have a MapRed ReflectionUtils that does this extra configuration?

          +1

          > Should we get this from the FileSystem cache?

          It should use FileSystem#statisticsTable (which, incidentally, should be named STATISTICS_TABLE).

          Perhaps we should add a FileSystem method like:

          public static Map<String, Statistics> getStatistics();

          That returns the stats indexed by URI scheme?

          Show
          Doug Cutting added a comment - > Should we add setVerifyChecksum to FileSystem? +1 > Replace with own line-reading code. Or move line-reading code to util? > Have a MapRed ReflectionUtils that does this extra configuration? +1 > Should we get this from the FileSystem cache? It should use FileSystem#statisticsTable (which, incidentally, should be named STATISTICS_TABLE). Perhaps we should add a FileSystem method like: public static Map<String, Statistics> getStatistics(); That returns the stats indexed by URI scheme?
          Hide
          Tsz Wo Nicholas Sze added a comment -

          > DataNode, FSNamesystem, SecondaryNameNode all depends on StatusHttpServer which is in mapred. Split out common part and put in core.

          +1

          We should create a package in core for the base web server class and the general servlet classes (e.g. org.apache.hadoop.util.ServletUtil). Should we create a separated issue for this?

          Show
          Tsz Wo Nicholas Sze added a comment - > DataNode, FSNamesystem, SecondaryNameNode all depends on StatusHttpServer which is in mapred. Split out common part and put in core. +1 We should create a package in core for the base web server class and the general servlet classes (e.g. org.apache.hadoop.util.ServletUtil). Should we create a separated issue for this?
          Hide
          Tom White added a comment -

          Should we create a separated issue for this?

          Yes. It's probably worth creating subtasks for the remaining fixes as they are all fairly large and unrelated to each other. When they've been fixed we can make the change to enforce the dependencies by compiling each module separately.

          Show
          Tom White added a comment - Should we create a separated issue for this? Yes. It's probably worth creating subtasks for the remaining fixes as they are all fairly large and unrelated to each other. When they've been fixed we can make the change to enforce the dependencies by compiling each module separately.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Created HADOOP-3824 as a sub-task for StatusHttpServer.

          Show
          Tsz Wo Nicholas Sze added a comment - Created HADOOP-3824 as a sub-task for StatusHttpServer.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          The latest patch in HADOOP-3824 is ready to be committed. See whether you would like to take a look.

          Show
          Tsz Wo Nicholas Sze added a comment - The latest patch in HADOOP-3824 is ready to be committed. See whether you would like to take a look.
          Hide
          Owen O'Malley added a comment -

          Actually, the dependency should just be:

          hdfs depends on core
          mapred depends on core

          Show
          Owen O'Malley added a comment - Actually, the dependency should just be: hdfs depends on core mapred depends on core
          Hide
          Sharad Agarwal added a comment -

          This patch:

          • FileSystem: removed the deprecated method
          • build.xml:
            split the compile-core-classes into separate compile-mapred-classes and compile-hdfs-classes to have proper dependency.
            (Ultimately when core, mapred, hdfs projects split, build.xml has to be split as well. Till then need to restrict the cyclic dependency code to creep in.)
          Show
          Sharad Agarwal added a comment - This patch: FileSystem: removed the deprecated method build.xml: split the compile-core-classes into separate compile-mapred-classes and compile-hdfs-classes to have proper dependency. (Ultimately when core, mapred, hdfs projects split, build.xml has to be split as well. Till then need to restrict the cyclic dependency code to creep in.)
          Hide
          Tom White added a comment -

          +1

          Show
          Tom White added a comment - +1
          Hide
          Sharad Agarwal added a comment -

          I think this patch can go in and doesn't need to wait for HADOOP-4631. Checking this in early would restrict users to introduce any cyclic dependency in code.

          Show
          Sharad Agarwal added a comment - I think this patch can go in and doesn't need to wait for HADOOP-4631 . Checking this in early would restrict users to introduce any cyclic dependency in code.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12394174/3750_v1.patch
          against trunk revision 720930.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12394174/3750_v1.patch against trunk revision 720930. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3664/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          Correction in build.xml to include $

          {build.src}

          directory for compilation

          Show
          Sharad Agarwal added a comment - Correction in build.xml to include $ {build.src} directory for compilation
          Hide
          Sharad Agarwal added a comment -

          all core and contrib tests passed on my local box.
          output from test-patch:

               [exec]     -1 overall.
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
               [exec]                         Please justify why no tests are needed for this patch.
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          
          Show
          Sharad Agarwal added a comment - all core and contrib tests passed on my local box. output from test-patch: [exec] -1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          Hide
          Tom White added a comment -

          I've just committed this. Thanks Sharad!

          Show
          Tom White added a comment - I've just committed this. Thanks Sharad!
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #680 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/680/ )

            People

            • Assignee:
              Sharad Agarwal
              Reporter:
              Tom White
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development