Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1688

hudi write should uncache rdd, when the write operation is finnished

    XMLWordPrintableJSON

Details

    Description

      now, hudi improve write performance by cache necessary rdds; however when the write operation is finnished, those cached rdds have not been uncached which waste lots of memory.

      https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L115

      https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L214

      In our environment:

      step1: insert 100GB data into hudi table by spark   (ok)

      step2: insert another 100GB data into hudi table by spark again (oom ) 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              xiaotaotao tao meng
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: