Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-8989

hadoop fs -find feature

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.7.0
    • None
    • None
    • New fs -find command

    Description

      Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view?

      The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order):

      • -type (file or directory, for now)
      • -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
      • -print0 (for piping to xargs -0)
      • -depth
      • -owner/-group (and -nouser/-nogroup)
      • -name (allowing for shell pattern, or even regex?)
      • -perm
      • -size

      One possible special case, but could possibly be really cool if it ran from within the NameNode:

      • -delete
        The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow.

      Lower priority, some people do use operators, mostly to execute -or searches such as:

      • find / (-nouser -or -nogroup)

      Finally, I thought I'd include a link to the Posix spec for find

      Attachments

        1. HADOOP-8989.patch
          98 kB
          Jonathan Allen
        2. HADOOP-8989.patch
          140 kB
          Jonathan Allen
        3. HADOOP-8989.patch
          230 kB
          Jonathan Allen
        4. HADOOP-8989.patch
          252 kB
          Jonathan Allen
        5. HADOOP-8989.patch
          297 kB
          Jonathan Allen
        6. HADOOP-8989.patch
          331 kB
          Jonathan Allen
        7. HADOOP-8989.patch
          338 kB
          Jonathan Allen
        8. HADOOP-8989.patch
          361 kB
          Jonathan Allen
        9. HADOOP-8989.patch
          400 kB
          Jonathan Allen
        10. HADOOP-8989.patch
          400 kB
          Jonathan Allen
        11. HADOOP-8989.patch
          400 kB
          Jonathan Allen
        12. HADOOP-8989.patch
          400 kB
          Jonathan Allen
        13. HADOOP-8989.patch
          126 kB
          Jonathan Allen
        14. HADOOP-8989.patch
          126 kB
          Jonathan Allen
        15. HADOOP-8989.patch
          131 kB
          Jonathan Allen
        16. HADOOP-8989.patch
          130 kB
          Jonathan Allen
        17. HADOOP-8989.patch
          141 kB
          Jonathan Allen
        18. HADOOP-8989.patch
          146 kB
          Jonathan Allen
        19. HADOOP-8989.patch
          140 kB
          Jonathan Allen
        20. HADOOP-8989.patch
          140 kB
          Jonathan Allen
        21. HADOOP-8989.patch
          140 kB
          Jonathan Allen
        22. HADOOP-8989.patch
          144 kB
          Jonathan Allen

        Issue Links

          Activity

            People

              jonallen Jonathan Allen
              menicosia Marco Nicosia
              Votes:
              4 Vote for this issue
              Watchers:
              42 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: