Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
DistributedFileSystem.getFileBlockLocations() may occasionally return numeric ips as hostnames. This seems to be a breach of the FileSystem.getFileBlockLocation() contract:
/** * Return an array containing hostnames, offset and size of * portions of the given file. For a nonexistent * file or regions, null will be returned. * * This call is most helpful with DFS, where it returns * hostnames of machines that contain the given file. * * The FileSystem will simply return an elt containing 'localhost'. */ public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len) throws IOException
One (maybe minor) consequence of this issue is: When a job includes such numeric ips in in its splits' locations, JobTracker would not be able to assign the job's map tasks local to the file blocks.
We should either fix the implementation or change the contract. In the latter case, JobTracker needs to be fixed to maintain both the hostnames and ips of the TaskTrackers.