Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.17.0
Description
While working with PyFlink, we found that in Process Mode, the Python UDF process may leak after a failover of the job. It leads to a rising number of processes with their threads in the host machine, which eventually results in failure to create new threads.
You can try to reproduce it with the attached test task `streamin_word_count.py`.
(Note that the job will continue failover, and you can watch the process leaks by `ps -ef` on Taskmanager.
Our test environment:
- K8S Application Mode
- 4 Taskmanagers with 12 slots/TM
- Job's parallelism was set to 48
The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence with slots of TM (12), but we found that there are 180 processes on one Taskmanager after several failovers.
Attachments
Attachments
Issue Links
- links to