Issue Details (XML | Word | Printable)

Key: HADOOP-2027
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Lohit Vijayarenu
Reporter: Owen O'Malley
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

FileSystem should provide byte ranges for file locations

Created: 10/Oct/07 10:02 PM   Updated: 21/May/08 08:05 PM
Return to search
Component/s: fs
Affects Version/s: None
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-2027-1.patch 2008-02-09 09:14 AM Lohit Vijayarenu 22 kB
Text File Licensed for inclusion in ASF works HADOOP-2027-10.patch 2008-02-29 07:28 AM Lohit Vijayarenu 35 kB
Text File HADOOP-2027-14.patch 2008-03-10 11:31 PM Lohit Vijayarenu 38 kB
Text File Licensed for inclusion in ASF works HADOOP-2027-2.patch 2008-02-13 09:21 AM Lohit Vijayarenu 33 kB
Text File Licensed for inclusion in ASF works HADOOP-2027-3.patch 2008-02-13 09:50 AM Lohit Vijayarenu 33 kB
Text File Licensed for inclusion in ASF works HADOOP-2027-4.patch 2008-02-13 05:29 PM Lohit Vijayarenu 29 kB
Text File Licensed for inclusion in ASF works HADOOP-2027-5.patch 2008-02-13 07:57 PM Lohit Vijayarenu 31 kB
Text File Licensed for inclusion in ASF works HADOOP-2027-6.patch 2008-02-13 09:56 PM Lohit Vijayarenu 35 kB
Text File Licensed for inclusion in ASF works HADOOP-2027-7.patch 2008-02-27 06:11 PM Lohit Vijayarenu 31 kB
Text File Licensed for inclusion in ASF works HADOOP-2027-8.patch 2008-02-27 06:16 PM Lohit Vijayarenu 35 kB
Text File Licensed for inclusion in ASF works HADOOP-2027-9.patch 2008-02-28 09:00 PM Lohit Vijayarenu 35 kB
Text File Licensed for inclusion in ASF works HADOOP-2559-11.patch 2008-03-06 06:37 AM Lohit Vijayarenu 37 kB
Text File Licensed for inclusion in ASF works HADOOP-2559-12.patch 2008-03-06 08:46 AM Lohit Vijayarenu 39 kB
Text File Licensed for inclusion in ASF works HADOOP-2559-13.patch 2008-03-06 06:05 PM Lohit Vijayarenu 38 kB
Issue Links:
Reference
 

Hadoop Flags: Incompatible change
Release Note: New FileSystem API getFileBlockLocations to return the number of bytes in each block in a file via a single rpc to the namenode to speed up job planning. Deprecates getFileCacheHints.
Resolution Date: 19/Mar/08 11:56 PM


 Description  « Hide
FileSystem's getFileCacheHints should be replaced with something more useful. I'd suggest replacing getFileCacheHints with a new method:
BlockLocation[] getFileLocations(Path file, long offset, long range) throws IOException;

and adding

class BlockLocation implements Writable {
  String[] getHosts();
  long getOffset();
  long getLength();
}


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Owen O'Malley added a comment - 05/Feb/08 10:40 PM
Note that we also need Map/Reduce to use the new method so that it only does one call per a file to get block sizes. This would require that FileSplit have a new constructor that takes an array of locations rather than computing it on demand. The locations do NOT need to be serialized in the read/write fields methods. FileInputFormat should use a single call to getFileLocations rather than the current getSize, getBlockSize, and getFileCacheHints (down in FileSplit).

Lohit Vijayarenu added a comment - 09/Feb/08 09:14 AM
Thanks Owen. Attached patch includes
1. new API getFileBlockLocations which invokes getBlockLocations to return BlockLocation[]
2. Changes FileSplit to store host information and return when getLocations() is invoked
3. Change FileInputFormat to one call of getFileBlockLocations and store host information in FileSplit using new constructor

I ran the unit test and do not see failures. Will test benchmark and report the timings.


Lohit Vijayarenu added a comment - 12/Feb/08 12:59 AM
I ran sort (twice) on 100 nodes on trunk+this patch. It took 28.4 and 27.3 minutes. Mukund mentioned it took 29.04 min on trunk.

Lohit Vijayarenu added a comment - 12/Feb/08 01:00 AM
Making this PA

Hadoop QA added a comment - 12/Feb/08 06:27 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12375144/HADOOP-2027-1.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 21 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 619 javac compiler warnings (more than the trunk's current 608 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs -1. The patch appears to introduce 3 new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1781/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1781/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1781/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1781/console

This message is automatically generated.


Owen O'Malley added a comment - 12/Feb/08 05:27 PM
You should deprecate the old FileSplit constructor and make it call the new one.

Set hosts to null in readFields.


Owen O'Malley added a comment - 12/Feb/08 06:41 PM
You should:
1. Not use strings of '*' around your javadoc.
2. Fill in the javadoc of public methods in BlockLocation.
3. I'd prefer using String[] in BlockLocation, since the API uses String rather than Text.
4. FileSystem.getFileBlockLocations should just pass the desired values into the constructor rather than setting them all, same for DFSClient
5. The indentation in FileInputFormat should bring lines to the open of the paren
6. Fix the calls to the now deprecated methods.

Thanks! I'm looking forward to this patch.


Lohit Vijayarenu added a comment - 13/Feb/08 09:21 AM
Incorporating changes suggested by Owen, removed javac warnings which were due to deprecated calls. I could not get rid of 2 of them which were deprecated calls to Kosmos FileSystem. I am submitting this patch for QA run. Will either try to get those fixed or open new JIRA to fix it

Lohit Vijayarenu added a comment - 13/Feb/08 09:50 AM
Resubmitting against latest trunk

Hadoop QA added a comment - 13/Feb/08 10:26 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12375464/HADOOP-2027-2.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 30 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 608 javac compiler warnings (more than the trunk's current 604 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs -1. The patch appears to introduce 3 new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1788/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1788/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1788/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1788/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 13/Feb/08 05:29 PM
Another try fixing findbugs

Lohit Vijayarenu added a comment - 13/Feb/08 07:57 PM
I deprecated KFS getFileCacheHints and modified KFSEmulationImpl to call local Filesytems getFileBlockLocations
There are few javac warnings, which are due to other deprecated APIs like listPaths globPaths. I am attaching this patch against trunk

Lohit Vijayarenu added a comment - 13/Feb/08 08:27 PM
Canceling and resubmitting patch

Hadoop QA added a comment - 13/Feb/08 09:05 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12375522/HADOOP-2027-5.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 33 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs -1. The patch appears to cause Findbugs to fail.

core tests -1. The patch failed core unit tests.

contrib tests -1. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1790/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1790/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1790/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 13/Feb/08 09:56 PM
I missed BlockLocations.java (new file) so build failed. I will resubmit it

Hadoop QA added a comment - 13/Feb/08 11:09 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12375528/HADOOP-2027-6.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 33 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 605 javac compiler warnings (more than the trunk's current 603 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1791/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1791/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1791/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1791/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 13/Feb/08 11:15 PM
The javac warnings were expected due to other deprecated APIs.

Lohit Vijayarenu added a comment - 14/Feb/08 12:14 AM
Found the 2 additional warnings, they were from PhasedFileSystem.java
> [javac] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/PhasedFileSystem.java:300: warning: [deprecation] getFileCacheHints(org.apache.hadoop.fs.Path,long,long) in org.apache.hadoop.fs.FilterFileSystem has been deprecated> [javac] /zonestorage/hudson/home/hudson/hudson/jobs/Hadoop-Patch/workspace/trunk/src/java/org/apache/hadoop/mapred/PhasedFileSystem.java:300: warning: [deprecation] getFileCacheHints(org.apache.hadoop.fs.Path,long,long) in org.apache.hadoop.fs.FileSystem has been deprecated

Lohit Vijayarenu added a comment - 27/Feb/08 06:11 PM
Attaching same patch by regenerating against trunk.

Lohit Vijayarenu added a comment - 27/Feb/08 06:16 PM
sorry had missed BlockLocation again

Lohit Vijayarenu added a comment - 28/Feb/08 08:58 PM
Uploading new one with comments from dhruba.

Lohit Vijayarenu added a comment - 28/Feb/08 09:00 PM
Dhurba suggested it would be good to have information about host:port which is already provided by namenode call.
So i have one more API getNames() which is similar to DatanodeID's getName, this returns hostname:port and getHosts() returns hostnames as earlier. He suggest this is useful when we consider running 2 datanodes on same node. Attached is the patch which address this.

Lohit Vijayarenu added a comment - 28/Feb/08 09:01 PM
making this PA

Hadoop QA added a comment - 28/Feb/08 10:57 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376764/HADOOP-2027-9.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 33 new or modified tests.

patch -1. The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1866/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 29/Feb/08 07:28 AM
Regenerating against trunk. Tested this patch, applies clean on trunk.

Hadoop QA added a comment - 29/Feb/08 08:49 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376793/HADOOP-2027-10.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 33 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 617 javac compiler warnings (more than the trunk's current 614 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1874/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1874/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1874/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1874/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 06/Mar/08 06:37 AM
TestRackAwareTaskPlacement was failing because of my latest changes. While adding getNames() method, I tried to derive getHosts() from string returned by getName(), this had ipaddress:port. But we needed hostnames. So, I created 2 separate arrays within BlockLocation. Now both getHosts() and getNames() return the expected output. Attaching a new patch.

Hadoop QA added a comment - 06/Mar/08 07:47 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377229/HADOOP-2559-11.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 36 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 593 javac compiler warnings (more than the trunk's current 590 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests -1. The patch failed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1902/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1902/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1902/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1902/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 06/Mar/08 08:46 AM
modified TestTextInputFormat. With the updated patch, if file is of 0 length it will not be added to the splits.

Hadoop QA added a comment - 06/Mar/08 10:03 AM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377232/HADOOP-2559-12.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 42 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 592 javac compiler warnings (more than the trunk's current 590 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1903/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1903/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1903/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1903/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 06/Mar/08 10:14 AM
All tests passed. 2 additional javac warnings are due to PhasedFileSystem.java as expected.

Lohit Vijayarenu added a comment - 06/Mar/08 06:05 PM
For now, creating empty hosts array when of input file is zero length. Owen opened HADOOP-2952 to address zero length files. Attaching another patch.

Hadoop QA added a comment - 06/Mar/08 07:16 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377271/HADOOP-2559-13.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 39 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 592 javac compiler warnings (more than the trunk's current 590 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1905/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1905/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1905/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1905/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 06/Mar/08 07:51 PM
All tests passed. 2 additional javac warnings are due to PhasedFileSystem.java as expected.

Lohit Vijayarenu added a comment - 10/Mar/08 11:31 PM
Attaching another patch after changing getFileCacheHints in FileSystem.java and KFS.

Hadoop QA added a comment - 11/Mar/08 01:07 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12377568/HADOOP-2027-14.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 39 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac -1. The applied patch generated 599 javac compiler warnings (more than the trunk's current 598 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1938/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1938/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1938/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1938/console

This message is automatically generated.


Lohit Vijayarenu added a comment - 11/Mar/08 04:06 PM
javac warning from PhasedFileSystem. All other tests passed.

Owen O'Malley added a comment - 19/Mar/08 11:56 PM
I just committed this. Thanks, Lohit!

Hudson added a comment - 20/Mar/08 01:13 PM