Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24240

Add a config to control whether InMemoryFileIndex should update cache when refresh.

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.0
    • None
    • SQL

    Description

      In current code(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala#L172), after data is inserted, spark will always refresh file index and update the cache. If the target table has tons of files, job will suffer time and OOM issue. Could we add a config to control whether InMemoryFileIndex should update cache when refresh.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jinxing6042@126.com Jin Xing
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: