Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1658

A less expensive way to figure out directory size

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Currently in order to figure out a directory size, we have to list a directory by calling RPC getListing and get the number of its children. This is an expensive operation especially when a directory has many children because it may require multiple RPCs.

      On the other hand when fetching the status of a path (i.e. calling RPC getFileInfo), the length field of FileStatus is set to be 0 if the path is a directory.

      I am thinking to change this field (FileStatus#length) to be the directory size when the path is a directory. So we can call getFileInfo to get the directory size. This call is much less expensive and simpler than getListing.

      Attachments

        Issue Links

          Activity

            People

              weiyan Weiyan Wang
              hairong Hairong Kuang
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: