Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-79

listFiles optimization

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.1.0
    • None
    • None

    Description

      In FSDirectory.getListing() looking at line
      listing[i] = new DFSFileInfo(curName, cur.computeFileLength(), cur.computeContentsLength(), isDir(curName));

      1. computeContentsLength() is actually calling computeFileLength(), so this is called twice,
      meaning that file length is calculated twice.
      2. isDir() is looking for the INode (starting from the rootDir) that has actually been obtained
      just two lines above, note that the tree is locked by that time.

      I propose a simple optimization for this, see attachment.

      3. A related question: Why DFSFileInfo needs 2 separate fields len for file length and
      contentsLen for directory contents size? It looks like these fields are mutually exclusive,
      and we can use just one, interpreting it one way or another with respect to the value of isDir.

      Attachments

        1. DFSFileInfo.patch
          3 kB
          Konstantin Shvachko

        Activity

          People

            shv Konstantin Shvachko
            shv Konstantin Shvachko
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: