Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-29985

TaskManager might not close SlotTable on SIGTERM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.16.0, 1.17.0, 1.15.3
    • None
    • None

    Description

      When TM is stopped by RM, its slot table is closed, causing all its slots to be released.
      However, when TM is stopped by SIGTERM (i.e. external resource manager), its slot table is NOT closed.
       

      When a slot is released, the associated resources are released as well, in particular, MemoryManager.
      MemoryManager might hold not only memory, but also arbitrary shared resources (currently, PythonSharedResources and RocksDBSharedResources).
      As of now, RocksDBSharedResources contains only ephemeral resources. Not sure about PythonSharedResources, but likely it is associated with a separate process.
      That means that in standalone clusters, some resources might not be released.

      Attachments

        Activity

          People

            Unassigned Unassigned
            roman Roman Khachatryan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: