Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19748

refresh for InMemoryFileIndex with FileStatusCache does not work correctly

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0
    • Fix Version/s: 2.1.1, 2.2.0
    • Component/s: SQL
    • Labels:
      None

      Description

      If we refresh a InMemoryFileIndex with a FileStatusCache, it will first use the FileStatusCache to generate the cachedLeafFiles etc, then call FileStatusCache.invalidateAll. the order to do these two actions is wrong, this lead to the refresh action does not take effect.

        override def refresh(): Unit = {
          refresh0()
          fileStatusCache.invalidateAll()
        }
      
        private def refresh0(): Unit = {
          val files = listLeafFiles(rootPaths)
          cachedLeafFiles =
            new mutable.LinkedHashMap[Path, FileStatus]() ++= files.map(f => f.getPath -> f)
          cachedLeafDirToChildrenFiles = files.toArray.groupBy(_.getPath.getParent)
          cachedPartitionSpec = null
        }
      

        Attachments

          Activity

            People

            • Assignee:
              windpiger Song Jun
              Reporter:
              windpiger Song Jun
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: