|
My thinking is that it new Path(".") should throw an exception if there isn't enough information to convert it into an absolute path name.
> Since this fails to do what one would reasonably expect with "." as a path [ ... ]
Hmm. It does what I'd expect. "./foo" and "foo" name the same file, no? What's unexpected? > new Path(".") should throw an exception [ ... ] I don't see why. Having an unresolved Path that represents the connected directory seems reasonable to me. Paths can be relative and that is handy. Most applications want to make them fully qualified sooner rather than later, but I don't think an exception is the right answer.
> Hmm. It does what I'd expect. "./foo" and "foo" name the same file, no? What's unexpected?
Well, "." and fs.getWorkingDirectory() aren't the same thing, as in the above example. That was surprising to me, at least. Path can keep enough information after URI normalization to know that the original was a relative path when the string is "./foo", but not when it's simply "." Path already throws when it gets an empty string; would it be reasonable to assume that a Path successfully constructed as the empty string refers to the working directory? I can't think of a situation where reporting its URI as Path.CUR_DIR would be an error. It would also work in new Path("foo/bar", "../.."), etc. What problem is this causing? [Edit] > Well, "." and fs.getWorkingDirectory() aren't the same thing, as in the above example.
Can you describe what you'd expect the example to print? Perhaps the fix is to avoid normalizing URIs until they are dereferenced within a FileSystem implementation? That way "./foo" would print as "./foo" rather than just "foo". I see now what you meant, and I retract my point: the existing behavior matches expectations, except as in the original example.
Coupled with HADOOP-1909, I like the idea of leaving Paths relative until dereferenced within a FileSystem. Would it make sense to go further and require all Paths to be dereferenced this way? There's a lot of string manipulation and special-casing in Path, particularly for Windows filesystems. Pushing that out to the FS seems like a reasonable abstraction. Introducing a new type would also let users employ POSIX semantics for Paths, but URI semantics for Hadoop Paths (as in > There's a lot of string manipulation and special-casing in Path, particularly for Windows filesystems. Pushing that out to the FS seems like a reasonable abstraction.
One problem is that there's lots of code that passes things returned by File#getPath() to 'new Path(String)', and Windows file names are invalid URI paths. When we added Path.java to Hadoop we needed to do so back compatibly, since lots of user code manipulates file names and we didn't want to break it. To avoid processing Windows-specifics in Path.java and stay compatible, we'd need to either avoid creating URIs in a Path at all, or we'd have to escape backslashes and colons in the URI's path, and have FileSystem implementations remove those escapes. Perhaps that would work, although it might be hard to make it back-compatible with existing code. I've pulled a lot of my hair out in the process of getting Path to work on Windows and am personally reluctant to revisit this. But feel free to experiment and see if you can find a cleaner approach. am closing this bug as a wont fix since have "." return an empty path suffices for now.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Configuration conf = new Configuration(); Path cwd = new Path("."); Path kid1 = new Path(parent, "blah"); Path kid2 = new Path(FileSystem.get(conf).getWorkingDirectory(), "blah"); // kid1: blah // kid2: /home/user/blahis neither intuitive nor succinct. Paths are evaluated at construction and segments matching dot are summarily excised as part of URI normalization. Since this fails to do what one would reasonably expect with "." as a path, would it make sense to throw in this case? Certainly, Path doesn't have enough information to do much else.