Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      New fs -find command

      Description

      Both sysadmins and users make frequent use of the unix 'find' command, but Hadoop has no correlate. Without this, users are writing scripts which make heavy use of hadoop dfs -lsr, and implementing find one-offs. I think hdfs -lsr is somewhat taxing on the NameNode, and a really slow experience on the client side. Possibly an in-NameNode find operation would be only a bit more taxing on the NameNode, but significantly faster from the client's point of view?

      The minimum set of options I can think of which would make a Hadoop find command generally useful is (in priority order):

      • -type (file or directory, for now)
      • -atime/-ctime-mtime (... and -creationtime?) (both + and - arguments)
      • -print0 (for piping to xargs -0)
      • -depth
      • -owner/-group (and -nouser/-nogroup)
      • -name (allowing for shell pattern, or even regex?)
      • -perm
      • -size

      One possible special case, but could possibly be really cool if it ran from within the NameNode:

      • -delete
        The "hadoop dfs -lsr | hadoop dfs -rm" cycle is really, really slow.

      Lower priority, some people do use operators, mostly to execute -or searches such as:

      • find / (-nouser -or -nogroup)

      Finally, I thought I'd include a link to the Posix spec for find

      1. HADOOP-8989.patch
        144 kB
        Jonathan Allen
      2. HADOOP-8989.patch
        140 kB
        Jonathan Allen
      3. HADOOP-8989.patch
        140 kB
        Jonathan Allen
      4. HADOOP-8989.patch
        140 kB
        Jonathan Allen
      5. HADOOP-8989.patch
        146 kB
        Jonathan Allen
      6. HADOOP-8989.patch
        141 kB
        Jonathan Allen
      7. HADOOP-8989.patch
        130 kB
        Jonathan Allen
      8. HADOOP-8989.patch
        131 kB
        Jonathan Allen
      9. HADOOP-8989.patch
        126 kB
        Jonathan Allen
      10. HADOOP-8989.patch
        126 kB
        Jonathan Allen
      11. HADOOP-8989.patch
        400 kB
        Jonathan Allen
      12. HADOOP-8989.patch
        400 kB
        Jonathan Allen
      13. HADOOP-8989.patch
        400 kB
        Jonathan Allen
      14. HADOOP-8989.patch
        400 kB
        Jonathan Allen
      15. HADOOP-8989.patch
        361 kB
        Jonathan Allen
      16. HADOOP-8989.patch
        338 kB
        Jonathan Allen
      17. HADOOP-8989.patch
        331 kB
        Jonathan Allen
      18. HADOOP-8989.patch
        297 kB
        Jonathan Allen
      19. HADOOP-8989.patch
        252 kB
        Jonathan Allen
      20. HADOOP-8989.patch
        230 kB
        Jonathan Allen
      21. HADOOP-8989.patch
        140 kB
        Jonathan Allen
      22. HADOOP-8989.patch
        98 kB
        Jonathan Allen

        Issue Links

          Activity

          Marco Nicosia created issue -
          Owen O'Malley made changes -
          Field Original Value New Value
          Project Hadoop Common [ 12310240 ] HDFS [ 12310942 ]
          Key HADOOP-4412 HDFS-227
          Component/s dfs [ 12310710 ]
          Harsh J made changes -
          Link This issue is duplicated by HDFS-3124 [ HDFS-3124 ]
          Jonathan Allen made changes -
          Assignee Jonathan Allen [ jonallen ]
          Jonathan Allen made changes -
          Project Hadoop HDFS [ 12310942 ] Hadoop Common [ 12310240 ]
          Key HDFS-227 HADOOP-8989
          Jonathan Allen made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12551114 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12552876 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12553901 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12555635 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12561115 ]
          Jonathan Allen made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Release Note New fs -find command
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12561136 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12561197 ]
          Eli Collins made changes -
          Link This issue is related to HADOOP-9195 [ HADOOP-9195 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12569735 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12572993 ]
          Jonathan Allen made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Jonathan Allen made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12573783 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12624208 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12639169 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12642146 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12642147 ]
          Jonathan Allen made changes -
          Link This issue is depended upon by HADOOP-10544 [ HADOOP-10544 ]
          Jonathan Allen made changes -
          Link This issue is depended upon by HADOOP-10578 [ HADOOP-10578 ]
          Jonathan Allen made changes -
          Link This issue is depended upon by HADOOP-10579 [ HADOOP-10579 ]
          Jonathan Allen made changes -
          Link This issue is depended upon by HADOOP-10580 [ HADOOP-10580 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12643862 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12650144 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12651267 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12651882 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12651889 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12652301 ]
          Jonathan Allen made changes -
          Status Patch Available [ 10002 ] In Progress [ 3 ]
          Jonathan Allen made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12653073 ]
          Jonathan Allen made changes -
          Attachment HADOOP-8989.patch [ 12667105 ]
          Allen Wittenauer made changes -
          Summary hadoop dfs -find feature hadoop fs -find feature
          Allen Wittenauer made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 2.7.0 [ 12327583 ]
          Resolution Fixed [ 1 ]
          Vinod Kumar Vavilapalli made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Jonathan Allen
              Reporter:
              Marco Nicosia
            • Votes:
              4 Vote for this issue
              Watchers:
              42 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development