[SPARK-24240] Add a config to control whether InMemoryFileIndex should update cache when refresh. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

In current code(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L172), after data is inserted, spark will always refresh file index and update the cache. If the target table has tons of files, job will suffer time and OOM issue. Could we add a config to control whether InMemoryFileIndex should update cache when refresh.

Attachments

Issue Links

links to

[Github] Pull Request #21289 (jinxing64)

Activity

People

Assignee:: Unassigned

Reporter:: Jin Xing

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/May/18 07:59

Updated:: 08/Oct/19 05:42

Resolved:: 08/Oct/19 05:42