Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40765

Optimize redundant fs operations in `CommandUtils#calculateSingleLocationSize#getPathSize` method

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • SQL
    • None

    Description

      def getPathSize(fs: FileSystem, path: Path): Long = {
            val fileStatus = fs.getFileStatus(path)
            val size = if (fileStatus.isDirectory) {
              fs.listStatus(path)
                .map { status =>
                  if (isDataPath(status.getPath, stagingDir)) {
                    getPathSize(fs, status.getPath)
                  } else {
                    0L
                  }
                }.sum
            } else {
              fileStatus.getLen
            }      size
          } 

      Change input parameter from `Path` to `FileStatus`,  there is no need to do `fs.getFileStatus(path)` after each recursive call.

      Attachments

        Activity

          People

            LuciferYang Yang Jie
            LuciferYang Yang Jie
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: