Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-713

dfs list operation is too expensive

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.8.0
    • 0.15.1
    • None
    • None

    Description

      A list request to dfs returns an array of DFSFileInfo. A DFSFileInfo of a directory contains a field called contentsLen, indicating its size which gets computed at the namenode side by resursively going through its subdirs. At the same time, the whole dfs directory tree is locked.

      The list operation is used a lot by DFSClient for listing a directory, getting a file's size and # of replicas, and getting the size of dfs. Only the last operation needs the field contentsLen to be computed.

      To reduce its cost, we can add a flag to the list request. ContentsLen is computed If the flag is set. By default, the flag is false.

      Attachments

        1. optimizeComputeContentLen.patch
          2 kB
          Dhruba Borthakur
        2. optimizeComputeContentLen2.patch
          6 kB
          Dhruba Borthakur
        3. optimizeComputeContentLen3.patch
          10 kB
          Dhruba Borthakur

        Issue Links

          Activity

            People

              dhruba Dhruba Borthakur
              hairong Hairong Kuang
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: