Issue Details (XML | Word | Printable)

Key: HADOOP-1967
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Doug Cutting
Reporter: Lohit Vijayarenu
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

hadoop dfs -ls, -get, -mv command's source/destination URI are inconsistent

Created: 28/Sep/07 09:31 PM   Updated: 08/Jul/09 04:42 PM
Return to search
Component/s: None
Affects Version/s: 0.14.1
Fix Version/s: 0.17.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works HADOOP-1967_1.patch 2008-02-25 09:49 PM Mahadev konar 3 kB
Text File Licensed for inclusion in ASF works HADOOP-1967_2.patch 2008-02-26 08:26 PM Doug Cutting 7 kB
Text File Licensed for inclusion in ASF works HADOOP-1967_3.patch 2008-02-26 09:09 PM Doug Cutting 7 kB
Text File Licensed for inclusion in ASF works HADOOP-1967_4.patch 2008-02-28 06:01 PM Doug Cutting 6 kB

Resolution Date: 29/Feb/08 06:36 PM


 Description  « Hide
While specifying source/destination path for hadoop dfs -ls, -get, -mv, -cp commands, we have some inconsistency related to 'hdfs://' scheme.

Particularly, few of the commands accept both formats
[1] hdfs:///user/lohit/testfile
[2] hdfs://myhost:8020/user/lohit/testfile

and few commands accept only paths, which have authority (host:port)
[2] hdfs://myhost:8020/user/lohit/testfile

below are examples
hadoop dfs -ls (works for both formats)

[lohit@krygw1000 ~]$ hadoop dfs -ls hdfs://kry-nn1:8020/user/lohit/ranges
Found 1 items
/user/lohit/ranges <r 3> 24 1970-01-01 00:00
[lohit@krygw1000 ~]$ hadoop dfs -ls hdfs:///user/lohit/ranges
Found 1 items

hadoop dfs -get (works for only format [2])

[lohit@krygw1000 ~]$ hadoop dfs -get hdfs:///user/lohit/ranges .
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
hdfs:/user/lohit/ranges, expected: hdfs://kry-nn1:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:204)
at
org.apache.hadoop.dfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:108)
at
org.apache.hadoop.dfs.DistributedFileSystem.getPath(DistributedFileSystem.java:104)
at
org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:319)
at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:423)
at org.apache.hadoop.fs.FsShell.copyToLocal(FsShell.java:177)
at org.apache.hadoop.fs.FsShell.copyToLocal(FsShell.java:155)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1233)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342)
[lohit@krygw1000 ~]$ hadoop dfs -get hdfs://kry-nn1:8020/user/lohit/ranges .
[lohit@krygw1000 ~]$ ls ./ranges
./ranges
[lohit@krygw1000 ~]$

hadoop dfs -mv / -cp command. source path accepts both format [1] and [2], while destination accepts only [2].

[lohit@krygw1000 ~]$ hadoop dfs -cp hdfs://kry-nn1:8020/user/lohit/ranges.test2
hdfs:///user/lohit/ranges.test
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
hdfs:/user/lohit/ranges.test, expected: hdfs://kry-nn1:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:204)
at
org.apache.hadoop.dfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:108)
at
org.apache.hadoop.dfs.DistributedFileSystem.getPath(DistributedFileSystem.java:104)
at
org.apache.hadoop.dfs.DistributedFileSystem.exists(DistributedFileSystem.java:162)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:269)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:117)
at org.apache.hadoop.fs.FsShell.copy(FsShell.java:691)
at org.apache.hadoop.fs.FsShell.copy(FsShell.java:727)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1260)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342)
[lohit@krygw1000 ~]$ hadoop dfs -cp hdfs:///user/lohit/ranges.test2
hdfs://kry-nn1:8020/user/lohit/ranges.test
[lohit@krygw1000 ~]$

We should have a consistent URI naming convention across all commands.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Mahadev konar added a comment - 22/Feb/08 08:01 PM
since hdfs:/// is an invalid uri... all the methods return an error saying that port out of range.. is that ok? or should we throw out invalid uri?

Lohit Vijayarenu added a comment - 22/Feb/08 08:10 PM
It would be good to throw readable error message. Invalid URI / missing host, port looks good

Doug Cutting added a comment - 22/Feb/08 08:22 PM
> since hdfs:/// is an invalid uri [...]

It's not invalid, it just doesn't specify a host and port. It should get its host and port from the default filesystem, if the default is hdfs, and otherwise throw an exception since the host & port are required in hdfs.

In general, all paths are resolved against the default FileSystem URI.

http://java.sun.com/j2se/1.4.2/docs/api/java/net/URI.html#resolve(java.net.URI)

Thus if the default filesystem is HDFS, "hdfs:///foo" will resolve to the default filesystem's host and port, and if the default filesystem is not HDFS, then it will resolve to "hdfs:///foo", which should generate an exception, since HDFS requires a host and port.

We could probably generate a better exception for this, in DistributedFileSystem#initialize().


Mahadev konar added a comment - 25/Feb/08 09:49 PM
this patch allows hdfs:/// uri and uses the default filesystem for this.

Doug Cutting added a comment - 26/Feb/08 08:26 PM
Here's a more general patch, that fixes this for all filesystems, rather than just HDFS. Now, whenever a Path is used with the same scheme as the default FileSystem but with no authority specified, the default FileSystem's authority will be used, if any.

Also:

  • added warnings for old-format FileSystem names, so that we can eventually remove their use, which have long been deprecated.
  • added accessor methods so that applications need no longer directly set or get "fs.default.name".
  • added an explicit check for host & port in HDFS

Mahadev konar added a comment - 26/Feb/08 08:58 PM
+1 for a general patch.

the patch you uploaded doug has a line

System.out.println(org.apache.hadoop.util.StringUtils.stringifyException(new Exception("foo")));

I guess it was for debugging purposes. Could you remove this line in the patch? everything else is fine.


Doug Cutting added a comment - 26/Feb/08 09:09 PM
Updated version of patch with debug statement removed.

Hadoop QA added a comment - 27/Feb/08 05:05 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376549/HADOOP-1967_3.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 3 new or modified tests.

javadoc -1. The javadoc tool appears to have generated 1 warning messages.

javac -1. The applied patch generated 620 javac compiler warnings (more than the trunk's current 619 warnings).

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1847/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1847/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1847/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1847/console

This message is automatically generated.


Doug Cutting added a comment - 28/Feb/08 06:01 PM
Fix javac and javadoc warnings.

Hadoop QA added a comment - 28/Feb/08 09:26 PM
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376748/HADOOP-1967_4.patch
against trunk revision 619744.

@author +1. The patch does not contain any @author tags.

tests included +1. The patch appears to include 3 new or modified tests.

javadoc +1. The javadoc tool did not generate any warning messages.

javac +1. The applied patch does not generate any new javac compiler warnings.

release audit +1. The applied patch does not generate any new release audit warnings.

findbugs +1. The patch does not introduce any new Findbugs warnings.

core tests +1. The patch passed core unit tests.

contrib tests +1. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1864/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1864/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1864/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1864/console

This message is automatically generated.


Doug Cutting added a comment - 29/Feb/08 06:36 PM
I committed this.

Hudson added a comment - 01/Mar/08 12:15 PM