Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.14.0, 1.12.5, 1.13.3
Description
If a job is submitted using the REST API's /jars/:jarid/run endpoint, user code has to be executed on the JobManager and it is doing this in a couple of (pooled) dispatcher threads like Flink-DispatcherRestEndpoint-thread-*.
If the user code is using thread locals (and not cleaning them up), they may remain in the thread with references to the ChildFirstClassloader of the job and thus leaking that.
We saw this for the jsoniter scala library at the JM which creates ThreadLocal instances but doesn't remove them, but it can actually happen with any user code or (worse) library used in user code.
There are a few workarounds a user can use, e.g. putting the library in Flink's lib/ folder or submitting via the Flink CLI, but these may actually not be possible to use, depending on the circumstances.
A proper fix should happen in Flink by guarding against any of these things in the dispatcher threads. We could, for example, spawn a separate thread for executing the user's main() method and once the job is submitted exit that thread and destroy all thread locals along with it.
Attachments
Issue Links
- relates to
-
FLINK-24888 Run jobgraph deserialization in separate thread
- Closed
- links to