Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
The FileSystem interface provides a very limited interface for finding the location of the data. The current method looks like:
String[][] getFileCacheHints(Path file, long start, long len) throws IOException
which returns a list of "block info" where the block info consists of a list host names. Because the hints don't include the information about where the block boundaries are, map/reduce is required to call the name node for each split. I'd propose that we fix the naming a bit and make it:
public class BlockInfo extends Writable {
public long getStart();
public String[] getHosts();
}
BlockInfo[] getFileHints(Path file, long start, long len) throws IOException;
So that map/reduce can query about the entire file and get the locations in a single call.
Attachments
Issue Links
- is related to
-
HADOOP-2027 FileSystem should provide byte ranges for file locations
- Closed
-
HADOOP-2187 FileSystem should return location information with byte ranges
- Closed