> I think we should change this to a Hadoop-specific class, e.g. FileName.
Why not URI? What required methods are missing from URI? Conversely, what URI methods do you think might cause problems?
Partially answering my own question, with URIs we'd have to check the schema host and port matched the fs when implementing each FS method. In other words, given that we need a FileSystem instance to do anything, the schema, host and port fields of the URI are usually redundant and force us to perform error checking. However these same fields would be useful when specifying MapReduce input and output directories, in command lines, etc., permitting one to easily specify non-default FileSystem implementations.
Note that I don't think URI buys us interoperability with other systems. So we should only use it if we think it will make writing Hadoop easier: if it consists of code that we'd need to mostly need to write anyway.
A side-benefit of URI is that it provides standards-defined filename syntax. We don't have to figure out how to, e.g., escape things, or how backslashes and colons should be treated, etc. We can simply point to a standard.
> I also propose that this class should be versioned, and contain some File-like metadata - for now I'm thinking specifically about creation / modification time.
This works so long as files are write-once. But if they can be appended to or overwritten then this information could get stale.