Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25022

ClassLoader leak with ThreadLocals on the JM when submitting a job through the REST API

    XMLWordPrintableJSON

Details

    Description

      If a job is submitted using the REST API's /jars/:jarid/run endpoint, user code has to be executed on the JobManager and it is doing this in a couple of (pooled) dispatcher threads like Flink-DispatcherRestEndpoint-thread-*.

      If the user code is using thread locals (and not cleaning them up), they may remain in the thread with references to the ChildFirstClassloader of the job and thus leaking that.

      We saw this for the jsoniter scala library at the JM which creates ThreadLocal instances but doesn't remove them, but it can actually happen with any user code or (worse) library used in user code.

       

      There are a few workarounds a user can use, e.g. putting the library in Flink's lib/ folder or submitting via the Flink CLI, but these may actually not be possible to use, depending on the circumstances.

       

      A proper fix should happen in Flink by guarding against any of these things in the dispatcher threads. We could, for example, spawn a separate thread for executing the user's main() method and once the job is submitted exit that thread and destroy all thread locals along with it.

      Attachments

        Issue Links

          Activity

            People

              chesnay Chesnay Schepler
              nkruber Nico Kruber
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: