Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-8234

DistributedFileSystem and Globber should apply PathFilter early

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      HDFS-985 added partial listing in listStatus to avoid listing entries of large directory in one go. If listStatus(Path p, PathFilter f) call is made, filter is applied after fetching all the entries resulting in a big list being constructed on the client side. If the DistributedFileSystem.listStatusInternal() applied the PathFilter it would be more efficient. So DistributedFileSystem should override listStatus(Path f, PathFilter filter) and apply PathFilter early.

      Globber.java also applies filter after calling listStatus. It should call listStatus with the PathFilter.

      FileStatus[] children = listStatus(candidate.getPath());
                 .........
                  for (FileStatus child : children) {
                    // Set the child path based on the parent path.
                    child.setPath(new Path(candidate.getPath(),
                            child.getPath().getName()));
                    if (globFilter.accept(child.getPath())) {
                      newCandidates.add(child);
                    }
                  }
      

      Attachments

        1. HDFS-8234.1.patch
          10 kB
          J.Andreina
        2. HDFS-8234.2.patch
          10 kB
          J.Andreina
        3. HDFS-8234.3.patch
          11 kB
          J.Andreina

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            andreina J.Andreina
            rohini Rohini Palaniswamy

            Dates

              Created:
              Updated:

              Slack

                Issue deployment