Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      New fs -find command

      Description

      Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view?

      The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order):

      • -type (file or directory, for now)
      • -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
      • -print0 (for piping to xargs -0)
      • -depth
      • -owner/-group (and -nouser/-nogroup)
      • -name (allowing for shell pattern, or even regex?)
      • -perm
      • -size

      One possible special case, but could possibly be really cool if it ran from within the NameNode:

      • -delete
        The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow.

      Lower priority, some people do use operators, mostly to execute -or searches such as:

      • find / (-nouser -or -nogroup)

      Finally, I thought I'd include a link to the Posix spec for find

      1. HADOOP-8989.patch
        98 kB
        Jonathan Allen
      2. HADOOP-8989.patch
        140 kB
        Jonathan Allen
      3. HADOOP-8989.patch
        230 kB
        Jonathan Allen
      4. HADOOP-8989.patch
        252 kB
        Jonathan Allen
      5. HADOOP-8989.patch
        297 kB
        Jonathan Allen
      6. HADOOP-8989.patch
        331 kB
        Jonathan Allen
      7. HADOOP-8989.patch
        338 kB
        Jonathan Allen
      8. HADOOP-8989.patch
        361 kB
        Jonathan Allen
      9. HADOOP-8989.patch
        400 kB
        Jonathan Allen
      10. HADOOP-8989.patch
        400 kB
        Jonathan Allen
      11. HADOOP-8989.patch
        400 kB
        Jonathan Allen
      12. HADOOP-8989.patch
        400 kB
        Jonathan Allen
      13. HADOOP-8989.patch
        126 kB
        Jonathan Allen
      14. HADOOP-8989.patch
        126 kB
        Jonathan Allen
      15. HADOOP-8989.patch
        131 kB
        Jonathan Allen
      16. HADOOP-8989.patch
        130 kB
        Jonathan Allen
      17. HADOOP-8989.patch
        141 kB
        Jonathan Allen
      18. HADOOP-8989.patch
        146 kB
        Jonathan Allen
      19. HADOOP-8989.patch
        140 kB
        Jonathan Allen
      20. HADOOP-8989.patch
        140 kB
        Jonathan Allen
      21. HADOOP-8989.patch
        140 kB
        Jonathan Allen
      22. HADOOP-8989.patch
        144 kB
        Jonathan Allen

        Issue Links

          Activity

            People

            • Assignee:
              Jonathan Allen
              Reporter:
              Marco Nicosia
            • Votes:
              4 Vote for this issue
              Watchers:
              41 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development