Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.15.4, 1.16.2, 1.18.0, 1.17.1
Description
We have encountered an issue when Flink fails to install python dependencies from requirements file if python environment contains setuptools dependency version 67.5.0 or above.
Flink job fails with following error:
py4j.protocol.Py4JJavaError: An error occurred while calling o118.await. : java.util.concurrent.ExecutionException: org.apache.flink.table.api.TableException: Failed to wait job finish at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) at org.apache.flink.table.api.internal.TableResultImpl.awaitInternal(TableResultImpl.java:118) at org.apache.flink.table.api.internal.TableResultImpl.await(TableResultImpl.java:81) ... Caused by: java.util.concurrent.ExecutionException: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 2ca4026944022ac4537c503464d4c47f) at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) at org.apache.flink.table.api.internal.InsertResultProvider.hasNext(InsertResultProvider.java:83) ... 6 more Caused by: org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID: 2ca4026944022ac4537c503464d4c47f) at org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:130) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ... Caused by: java.io.IOException: java.io.IOException: Failed to execute the command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install --ignore-installed -r /var/folders/rb/q_3h54_94b57gz_qkbp593vw0000gn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5 --install-option --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vw0000gn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements output: Usage: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] <requirement specifier> [package-index-options] ... /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r <requirements file> [package-index-options] ... /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] [-e] <vcs project url> ... /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] [-e] <local project path> ... /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] <archive url/path> ... no such option: --install-option at org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:144) at org.apache.flink.python.env.AbstractPythonEnvironmentManager.installRequirements(AbstractPythonEnvironmentManager.java:215) at org.apache.flink.python.env.AbstractPythonEnvironmentManager.lambda$open$0(AbstractPythonEnvironmentManager.java:126) at org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.createResource(AbstractPythonEnvironmentManager.java:435) at org.apache.flink.python.env.AbstractPythonEnvironmentManager$PythonEnvResources.getOrAllocateSharedResource(AbstractPythonEnvironmentManager.java:402) at org.apache.flink.python.env.AbstractPythonEnvironmentManager.open(AbstractPythonEnvironmentManager.java:114) at org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238) at org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57) at org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92) at org.apache.flink.table.runtime.operators.python.scalar.AbstractPythonScalarFunctionOperator.open(AbstractPythonScalarFunctionOperator.java:101) at org.apache.flink.table.runtime.operators.python.scalar.PythonScalarFunctionOperator.open(PythonScalarFunctionOperator.java:71) at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:734) ... Caused by: java.io.IOException: Failed to execute the command: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install --ignore-installed -r /var/folders/rb/q_3h54_94b57gz_qkbp593vw0000gn/T/tm_localhost:63904-4178d3/blobStorage/job_2ca4026944022ac4537c503464d4c47f/blob_p-feb04bed919e628d98b1ef085111482f05bf43a1-1f5c3fda7cbdd6842e950e04d9d807c5 --install-option --prefix=/var/folders/rb/q_3h54_94b57gz_qkbp593vw0000gn/T/python-dist-c17912db-5aba-439f-9e39-69cb8c957bec/python-requirements output: Usage: /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] <requirement specifier> [package-index-options] ... /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] -r <requirements file> [package-index-options] ... /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] [-e] <vcs project url> ... /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] [-e] <local project path> ... /Users/z3d1k/.pyenv/versions/3.9.17/bin/python -m pip install [options] <archive url/path> ... no such option: --install-option at org.apache.flink.python.util.PythonEnvironmentManagerUtils.execute(PythonEnvironmentManagerUtils.java:220) at org.apache.flink.python.util.PythonEnvironmentManagerUtils.pipInstallRequirements(PythonEnvironmentManagerUtils.java:130) ... 20 more
Before installing dependencies provided in the requirements.txt file, Flink checks version of pip, and based on result constructs cli command to execute [1]. Version check is performed by running following command [2]:
python -c "import sys;from pkg_resources import get_distribution, parse_version;pip_version = get_distribution('pip').version;print(parse_version(pip_version) >= parse_version(sys.args[1]))"
Starting from this version, python will print deprecation warning when package pkg_resources is used [3]:
<string>:1: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
True
This warning breaks output parsing, causing Flink to assume that python environment has pip with version 8.0.0 or below, adding "--install-option" argument. This argument had been deprecated and removed from recent pip versions.
Proposed solution:
Remove pip version check since pip of version `8.0.0` does not support recent python versions:
Documentation states that Flink supports python 3.7 and above for Flink 1.17 [4] and earliest version to support python 3.7 is pip 18.1 [5].
References:
[1]: https://github.com/apache/flink/blob/c1fba732738aaaa0d576f80881eb464a954e12c5/flink-python/src/main/java/org/apache/flink/python/util/PythonEnvironmentManagerUtils.java#L61-L65
[2]: https://github.com/apache/flink/blob/c1fba732738aaaa0d576f80881eb464a954e12c5/flink-python/src/main/java/org/apache/flink/python/util/PythonEnvironmentManagerUtils.java#L109-L117
[3]: https://github.com/pypa/setuptools/pull/3843
[4]: https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/python/installation/
[5]: https://pip.pypa.io/en/stable/news/#v18-1