Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-1688

hudi write should uncache rdd, when the write operation is finnished

    XMLWordPrintableJSON

    Details

      Description

      now, hudi improve write performance by cache necessary rdds; however when the write operation is finnished, those cached rdds have not been uncached which waste lots of memory.

      https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L115

      https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java#L214

      In our environment:

      step1: insert 100GB data into hudi table by spark   (ok)

      step2: insert another 100GB data into hudi table by spark again (oom ) 

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                xiaotaotao tao meng
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: