Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-713

dfs list operation is too expensive

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.8.0
    • 0.15.1
    • None
    • None

    Description

      A list request to dfs returns an array of DFSFileInfo. A DFSFileInfo of a directory contains a field called contentsLen, indicating its size which gets computed at the namenode side by resursively going through its subdirs. At the same time, the whole dfs directory tree is locked.

      The list operation is used a lot by DFSClient for listing a directory, getting a file's size and # of replicas, and getting the size of dfs. Only the last operation needs the field contentsLen to be computed.

      To reduce its cost, we can add a flag to the list request. ContentsLen is computed If the flag is set. By default, the flag is false.

      Attachments

        1. optimizeComputeContentLen.patch
          2 kB
          Dhruba Borthakur
        2. optimizeComputeContentLen2.patch
          6 kB
          Dhruba Borthakur
        3. optimizeComputeContentLen3.patch
          10 kB
          Dhruba Borthakur

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dhruba Dhruba Borthakur
            hairong Hairong Kuang
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment