Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-713

dfs list operation is too expensive

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.8.0
    • 0.15.1
    • None
    • None

    Description

      A list request to dfs returns an array of DFSFileInfo. A DFSFileInfo of a directory contains a field called contentsLen, indicating its size which gets computed at the namenode side by resursively going through its subdirs. At the same time, the whole dfs directory tree is locked.

      The list operation is used a lot by DFSClient for listing a directory, getting a file's size and # of replicas, and getting the size of dfs. Only the last operation needs the field contentsLen to be computed.

      To reduce its cost, we can add a flag to the list request. ContentsLen is computed If the flag is set. By default, the flag is false.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dhruba Dhruba Borthakur Assign to me
            hairong Hairong Kuang
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment