Issue Details (XML | Word | Printable)

Key: HADOOP-4188
Type: Sub-task Sub-task
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Sharad Agarwal
Reporter: Tom White
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Hadoop Common
HADOOP-3750

Remove Task's dependency on concrete file systems

Created: 16/Sep/08 01:00 PM   Updated: 08/Jul/09 04:53 PM
Return to search
Component/s: None
Affects Version/s: None
Fix Version/s: 0.20.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 4188_v1.patch 2008-11-07 01:30 PM Sharad Agarwal 10 kB
Text File Licensed for inclusion in ASF works 4188_v2.patch 2008-11-10 07:49 AM Sharad Agarwal 12 kB

Hadoop Flags: Reviewed, Incompatible change
Release Note: Removed Task's dependency on concrete file systems by taking list from FileSystem class. Added statistics table to FileSystem class. Deprecated FileSystem method getStatistics(Class<? extends FileSystem> cls).
Resolution Date: 17/Nov/08 12:23 PM


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Owen O'Malley added a comment - 23/Sep/08 05:01 PM
Is it just local file system? Or is it something more specific. It is clearly ok for mapred to depend on core.

Tom White added a comment - 30/Oct/08 12:30 PM
Task depends on HDFS's DistributedFileSystem for updating statistics. It also has a list of file systems that it updates statistics for, it should get this list from FileSystem. See https://issues.apache.org/jira/browse/HADOOP-3750?focusedCommentId=12615367#action_12615367

Sharad Agarwal added a comment - 06/Nov/08 12:49 PM
The Task.FileSystemCounter enum has the read/write counters for all concrete filesystems. So if we want Task to be completely agnostic to filesystems, the framework counters also somehow needs to be dynamically created. Currently there is a static mapping between these - Task_FileSystemCounter.properties
We would then need to associate counter names based on filesystem URI schemes, which is not quite possible if using current FileSystem#statisticsTable.
We can add Map<String, Statistics> statsByUriScheme to FileSystem as suggested by Doug and use that. This map can be populated in createFileSystem(URI uri, Configuration conf) call as :
statsByUriScheme.put(uri.getScheme(), fs.statistics);

Other very straightforward alternative which looks good to me:
Just break the compile time dependency on the concrete file system and use conf.getClassByName; as anyway Task is aware of the concrete filesystems in Task.FileSystemCounter. Making it fully agnostic would require refactoring of Task.FileSystemCounter etc.


Sharad Agarwal added a comment - 07/Nov/08 01:30 PM
this patch (not tested yet) :
  • adds statistics table to FileSystem class indexed by URIScheme
  • removes the Task's dependency on concrete file systems by taking the list from FileSystem class
  • dynamically creates the File system counters in Task of the form <URIScheme>_BYTES_READ/<URIScheme>_BYTES_WRITE

Tom White added a comment - 07/Nov/08 06:32 PM
Looks good. This change renames the counters. Do we need to support (and deprecate) the old counter names (i.e. do they form a part of the public API) while introducing the new counter names, or is it acceptable to mark this as an incompatible change?

Sharad Agarwal added a comment - 10/Nov/08 07:04 AM

Do we need to support (and deprecate) the old counter names

don't need to as these counters are not public


Sharad Agarwal added a comment - 10/Nov/08 07:49 AM
changes from the last patch:
  • fixed statistics for LocalFileSystem. LocalFileSystem wraps RawLocalFileSystem. Statistics gets updated for only RawLocalFileSystem. LocalFileSystem#statistics should return the statistics object from RawLocalFileSystem. For this I have set the statistics in FilterFileSystem to the wrapped FileSystem's statistics.
  • deprecated FileSytem#getStatistics(Class<? extends FileSystem> cls)

Hadoop QA added a comment - 10/Nov/08 01:38 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12393617/4188_v2.patch
against trunk revision 712615.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 9 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated 1 warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 Eclipse classpath. The patch retains Eclipse classpath integrity.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3568/console

This message is automatically generated.


Sharad Agarwal added a comment - 11/Nov/08 11:02 AM
javadoc warning was due to HADOOP-4621

Devaraj Das added a comment - 17/Nov/08 12:23 PM
I just committed this. Thanks, Sharad!

Robert Chansler added a comment - 03/Mar/09 01:46 AM
Edit release note for publication.