Details
-
Bug
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
Description
I am running with --requirements_file requirements.txt, which contains:
google-cloud-datastore
Unfortunately, when attempting to run this on the cloud dataflow, I get the following error trying to build the requirements:
Collecting setuptools (from protobuf>=3.0.0->google-cloud-core<0.24dev,>=0.23.1->google-cloud-datastore->-r requirements.txt (line 3)) File was already downloaded /var/folders/94/wngs1jw91_n2_jjjrfljtqrc0000gn/T/dataflow-requirements-cache/setuptools-34.3.2.zip Complete output from command python setup.py egg_info: Traceback (most recent call last): File "<string>", line 1, in <module> File "setuptools/__init__.py", line 12, in <module> import setuptools.version File "setuptools/version.py", line 1, in <module> import pkg_resources File "pkg_resources/__init__.py", line 70, in <module> import packaging.version ImportError: No module named packaging.version
Looking online https://github.com/pypa/setuptools/issues/937 , it appears this is due to "pip asking setuptools to build itself (from source dist), which is no longer supported."
I'm not sure what the correct fix is here...since protobuf depends on setuptools, and a lot of Google libraries depend on protobuf. Seems there is no way to list protobuf/setuptools as being "provided" by the beam runtime (ie https://github.com/pypa/pip/issues/3090).
I'm going to try using my own setup.py next and see if I can skirt around the issue, but this definitely seems like a bug with beam's requirements packager asking for too much?
In the case of GCE, I compile my dependencies into a docker image that extends the base GCE images (and lets me use binary installs), not sure something like that would work here?