Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9693

Possible memory leak in jobmanager retaining archived checkpoints

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      First, some context about the job

      • Flink 1.4.1
      • stand-alone deployment mode
      • embarrassingly parallel: all operators are chained together
      • parallelism is over 1,000
      • stateless except for Kafka source operators. checkpoint size is 8.4 MB.
      • set "state.backend.fs.memory-threshold" so that only jobmanager writes to S3 to checkpoint
      • internal checkpoint with 10 checkpoints retained in history

       

      Summary of the observations

      • 41,567 ExecutionVertex objects retained 9+ GB of memory
      • Expanded in one ExecutionVertex. it seems to storing the kafka offsets for source operator

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              trohrmann Till Rohrmann
              Reporter:
              stevenz3wu Steven Zhen Wu

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment