Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-5110

Reconile Flink JVM singleton management with deployment

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Not applicable
    • Component/s: runner-flink
    • Labels:
      None

      Description

      Ankur Goenka noticed through debugging that multiple instances of BatchFlinkExecutableStageContext.BatchFactory are loaded for a given job when executing in standalone cluster mode. This context factory is responsible for maintaining singleton state across a TaskManager (JVM) in order to share SDK Environments across workers in a given job. The multiple-loading breaks singleton semantics and results in an indeterminate number of Environments being created.

      It turns out that the Flink classloading mechanism is determined by deployment mode. Note that "user code" as referenced by this link is actually the Flink job server jar. Actual end-user code lives inside of the SDK Environment and uploaded artifacts.

      In order to maintain singletons without resorting to IPC (for example, using file locks and/or additional gRPC servers), we need to force non-dynamic classloading. For example, this happens when jobs are submitted to YARN for one-off deployments via `flink run`. However, connecting to an existing (Flink standalone) deployment results in dynamic classloading.

      We should investigate this behavior and either document (and attempt to enforce) deployment modes that are consistent with our requirements, or (if possible) create a custom classloader that enforces singleton loading.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bsidhom Ben Sidhom
                Reporter:
                bsidhom Ben Sidhom
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h