Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-21437

Memory leak when using filesystem state backend on Alibaba Cloud OSS

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Not A Bug
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None

      Description

      When using filesystem state backend, and storing checkpoints on Alibaba Cloud OSS

      flink-conf.yaml:

      state.backend: filesystem
      state.checkpoints.dir: oss://yourBucket/checkpoints
      fs.oss.endpoint: xxxxx
      fs.oss.accessKeyId: xxxxx
      fs.oss.accessKeySecret: xxxxx

      A memory leak (both jobmanager and taskmanager) would occur after a period of time, objects retained in jvm heap like:

      The class "java.io.DeleteOnExitHook", loaded by "<system class loader>", occupies 1,018,323,960 (96.47%) bytes. The memory is accumulated in one instance of "java.util.LinkedHashMap", loaded by "<system class loader>", which occupies 1,018,323,832 (96.47%) bytes.
      

       

      The root cause should be that when using flink-oss-fs-hadoop to upload file to OSS, OSSFileSystem will create temporary file, and deleteOnExit, so LinkedHashSet<String> files in DeleteOnExitHook will get bigger and bigger.

      org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem::create
      -> 
      org.apache.hadoop.fs.aliyun.oss.AliyunOSSOutputStream::new 
      -> 
      dirAlloc.createTmpFileForWrite("output-", -1L, conf) 
      -> 
      org.apache.hadoop.fs.LocalDirAllocator::createTmpFileForWrite 
      -> 
      result.deleteOnExit()
      

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              qccash Qian Chao
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: