Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change, Reviewed
    • Release Note:
      Removed Task's dependency on concrete file systems by taking list from FileSystem class. Added statistics table to FileSystem class. Deprecated FileSystem method getStatistics(Class<? extends FileSystem> cls).
    1. 4188_v2.patch
      12 kB
      Sharad Agarwal
    2. 4188_v1.patch
      10 kB
      Sharad Agarwal

      Activity

      Hide
      Robert Chansler added a comment -

      Edit release note for publication.

      Show
      Robert Chansler added a comment - Edit release note for publication.
      Hide
      Devaraj Das added a comment -

      I just committed this. Thanks, Sharad!

      Show
      Devaraj Das added a comment - I just committed this. Thanks, Sharad!
      Hide
      Sharad Agarwal added a comment -

      javadoc warning was due to HADOOP-4621

      Show
      Sharad Agarwal added a comment - javadoc warning was due to HADOOP-4621
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12393617/4188_v2.patch
      against trunk revision 712615.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 9 new or modified tests.

      -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/artifact/trunk/build/test/checkstyle-errors.html
      Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12393617/4188_v2.patch against trunk revision 712615. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/console This message is automatically generated.
      Hide
      Sharad Agarwal added a comment -

      changes from the last patch:

      • fixed statistics for LocalFileSystem. LocalFileSystem wraps RawLocalFileSystem. Statistics gets updated for only RawLocalFileSystem. LocalFileSystem#statistics should return the statistics object from RawLocalFileSystem. For this I have set the statistics in FilterFileSystem to the wrapped FileSystem's statistics.
      • deprecated FileSytem#getStatistics(Class<? extends FileSystem> cls)
      Show
      Sharad Agarwal added a comment - changes from the last patch: fixed statistics for LocalFileSystem. LocalFileSystem wraps RawLocalFileSystem. Statistics gets updated for only RawLocalFileSystem. LocalFileSystem#statistics should return the statistics object from RawLocalFileSystem. For this I have set the statistics in FilterFileSystem to the wrapped FileSystem's statistics. deprecated FileSytem#getStatistics(Class<? extends FileSystem> cls)
      Hide
      Sharad Agarwal added a comment -

      Do we need to support (and deprecate) the old counter names

      don't need to as these counters are not public

      Show
      Sharad Agarwal added a comment - Do we need to support (and deprecate) the old counter names don't need to as these counters are not public
      Hide
      Tom White added a comment -

      Looks good. This change renames the counters. Do we need to support (and deprecate) the old counter names (i.e. do they form a part of the public API) while introducing the new counter names, or is it acceptable to mark this as an incompatible change?

      Show
      Tom White added a comment - Looks good. This change renames the counters. Do we need to support (and deprecate) the old counter names (i.e. do they form a part of the public API) while introducing the new counter names, or is it acceptable to mark this as an incompatible change?
      Hide
      Sharad Agarwal added a comment -

      this patch (not tested yet) :

      • adds statistics table to FileSystem class indexed by URIScheme
      • removes the Task's dependency on concrete file systems by taking the list from FileSystem class
      • dynamically creates the File system counters in Task of the form <URIScheme>_BYTES_READ/<URIScheme>_BYTES_WRITE
      Show
      Sharad Agarwal added a comment - this patch (not tested yet) : adds statistics table to FileSystem class indexed by URIScheme removes the Task's dependency on concrete file systems by taking the list from FileSystem class dynamically creates the File system counters in Task of the form <URIScheme>_BYTES_READ/<URIScheme>_BYTES_WRITE
      Hide
      Sharad Agarwal added a comment -

      The Task.FileSystemCounter enum has the read/write counters for all concrete filesystems. So if we want Task to be completely agnostic to filesystems, the framework counters also somehow needs to be dynamically created. Currently there is a static mapping between these - Task_FileSystemCounter.properties
      We would then need to associate counter names based on filesystem URI schemes, which is not quite possible if using current FileSystem#statisticsTable.
      We can add Map<String, Statistics> statsByUriScheme to FileSystem as suggested by Doug and use that. This map can be populated in createFileSystem(URI uri, Configuration conf) call as :
      statsByUriScheme.put(uri.getScheme(), fs.statistics);

      Other very straightforward alternative which looks good to me:
      Just break the compile time dependency on the concrete file system and use conf.getClassByName; as anyway Task is aware of the concrete filesystems in Task.FileSystemCounter. Making it fully agnostic would require refactoring of Task.FileSystemCounter etc.

      Show
      Sharad Agarwal added a comment - The Task.FileSystemCounter enum has the read/write counters for all concrete filesystems. So if we want Task to be completely agnostic to filesystems, the framework counters also somehow needs to be dynamically created. Currently there is a static mapping between these - Task_FileSystemCounter.properties We would then need to associate counter names based on filesystem URI schemes, which is not quite possible if using current FileSystem#statisticsTable. We can add Map<String, Statistics> statsByUriScheme to FileSystem as suggested by Doug and use that. This map can be populated in createFileSystem(URI uri, Configuration conf) call as : statsByUriScheme.put(uri.getScheme(), fs.statistics); Other very straightforward alternative which looks good to me: Just break the compile time dependency on the concrete file system and use conf.getClassByName; as anyway Task is aware of the concrete filesystems in Task.FileSystemCounter. Making it fully agnostic would require refactoring of Task.FileSystemCounter etc.
      Hide
      Tom White added a comment -

      Task depends on HDFS's DistributedFileSystem for updating statistics. It also has a list of file systems that it updates statistics for, it should get this list from FileSystem. See https://issues.apache.org/jira/browse/HADOOP-3750?focusedCommentId=12615367#action_12615367

      Show
      Tom White added a comment - Task depends on HDFS's DistributedFileSystem for updating statistics. It also has a list of file systems that it updates statistics for, it should get this list from FileSystem. See https://issues.apache.org/jira/browse/HADOOP-3750?focusedCommentId=12615367#action_12615367
      Hide
      Owen O'Malley added a comment -

      Is it just local file system? Or is it something more specific. It is clearly ok for mapred to depend on core.

      Show
      Owen O'Malley added a comment - Is it just local file system? Or is it something more specific. It is clearly ok for mapred to depend on core.

        People

        • Assignee:
          Sharad Agarwal
          Reporter:
          Tom White
        • Votes:
          0 Vote for this issue
          Watchers:
          2 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development