Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19748

refresh for InMemoryFileIndex with FileStatusCache does not work correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.1.1, 2.2.0
    • SQL
    • None

    Description

      If we refresh a InMemoryFileIndex with a FileStatusCache, it will first use the FileStatusCache to generate the cachedLeafFiles etc, then call FileStatusCache.invalidateAll. the order to do these two actions is wrong, this lead to the refresh action does not take effect.

        override def refresh(): Unit = {
          refresh0()
          fileStatusCache.invalidateAll()
        }
      
        private def refresh0(): Unit = {
          val files = listLeafFiles(rootPaths)
          cachedLeafFiles =
            new mutable.LinkedHashMap[Path, FileStatus]() ++= files.map(f => f.getPath -> f)
          cachedLeafDirToChildrenFiles = files.toArray.groupBy(_.getPath.getParent)
          cachedPartitionSpec = null
        }
      

      Attachments

        Activity

          People

            windpiger Song Jun
            windpiger Song Jun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: