Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25318 Improvement of scheduler and execution for Flink OLAP
  3. FLINK-32746

Using ZGC in JDK17 to solve long time class unloading STW

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Done
    • None
    • None
    • Table SQL / Runtime
    • None

    Description

      In a OLAP session cluster, a TM need to frequently create new classloaders and  generate new classes. These classes will be accumulated in metaspace. When metaspace data usage reaches a threshold, a FullGC with a long time Stop-the-World will be triggered. Currently, both SerialGC, ParallelGC and G1GC are doing Stop-the-World class unloading. Only ZGC supports concurrent class unload, see more in https://bugs.openjdk.org/browse/JDK-8218905.

       

      In our scenario, a class unloading for a 2GB metaspace with 5million classes will stop the application more than 40 seconds. After switch to ZGC, the maximum STW of the application has been reduced to less than 10ms.
       

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            xiangyu0xf xiangyu feng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment