Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-713

dfs list operation is too expensive

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.15.1
    • Component/s: None
    • Labels:
      None

      Description

      A list request to dfs returns an array of DFSFileInfo. A DFSFileInfo of a directory contains a field called contentsLen, indicating its size which gets computed at the namenode side by resursively going through its subdirs. At the same time, the whole dfs directory tree is locked.

      The list operation is used a lot by DFSClient for listing a directory, getting a file's size and # of replicas, and getting the size of dfs. Only the last operation needs the field contentsLen to be computed.

      To reduce its cost, we can add a flag to the list request. ContentsLen is computed If the flag is set. By default, the flag is false.

        Attachments

        1. optimizeComputeContentLen.patch
          2 kB
          Dhruba Borthakur
        2. optimizeComputeContentLen2.patch
          6 kB
          Dhruba Borthakur
        3. optimizeComputeContentLen3.patch
          10 kB
          Dhruba Borthakur

          Issue Links

            Activity

              People

              • Assignee:
                dhruba Dhruba Borthakur
                Reporter:
                hairong Hairong Kuang
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: