Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.0
-
None
Description
In current code(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L172), after data is inserted, spark will always refresh file index and update the cache. If the target table has tons of files, job will suffer time and OOM issue. Could we add a config to control whether InMemoryFileIndex should update cache when refresh.