Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
New fs -find command
Description
Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view?
The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order):
- -type (file or directory, for now)
- -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
- -print0 (for piping to xargs -0)
- -depth
- -owner/-group (and -nouser/-nogroup)
- -name (allowing for shell pattern, or even regex?)
- -perm
- -size
One possible special case, but could possibly be really cool if it ran from within the NameNode:
- -delete
The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow.
Lower priority, some people do use operators, mostly to execute -or searches such as:
- find / (-nouser -or -nogroup)
Finally, I thought I'd include a link to the Posix spec for find
Attachments
Attachments
Issue Links
- is depended upon by
-
HADOOP-10578 Find command - add navigation and execution expressions to find command
- Open
-
HADOOP-10579 Find command - add match expressions to find command
- In Progress
-
HADOOP-10580 Find command - add documentation and CLI tests to find command
- In Progress
-
HADOOP-10544 Find command - add operator functions to find command
- Patch Available
- is duplicated by
-
HDFS-3124 have find command in FsShell
- Resolved
- is related to
-
HADOOP-9195 Generic Use Date Range PathFilter
- Patch Available