Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-28114

The path of the Python client interpreter could not point to an archive file in distributed file system

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.15.1, 1.16.0
    • API / Python
    • None

    Description

      See https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonEnvUtils.java#L178 for more details about this limitation.

      Users could execute PyFlink jobs in YARN application mode as following:

      ./bin/flink run-application -t yarn-application \
            -Djobmanager.memory.process.size=1024m \
            -Dtaskmanager.memory.process.size=1024m \
            -Dyarn.application.name=<ApplicationName> \
            -Dyarn.ship-files=/path/to/shipfiles \
            -pyarch shipfiles/venv.zip \
            -pyclientexec venv.zip/venv/bin/python3 \
            -pyexec venv.zip/venv/bin/python3 \
            -py shipfiles/word_count.py
      

      In the above case, venv.zip will be distributed to the TMs via Flink blob server. However, blob server doesn't support files with size exceeding of 2GB. See https://github.com/apache/flink/blob/ea52732dc48a4f1c5be0925890cd8aa1ea2a11ed/flink-runtime/src/main/java/org/apache/flink/runtime/blob/BlobServerConnection.java#L223 for more details. This is very serious problem as Python users usually tend to install a lot Python libraries inside the venv.zip and some Python libraries are very large.

      Attachments

        Activity

          People

            dianfu Dian Fu
            dianfu Dian Fu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: