Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32758

PyFlink bounds are overly restrictive and outdated

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Hi! I am part of a team building the Flink backend for Ibis (https://github.com/ibis-project/ibis). We would like to leverage PyFlink under the hood for execution; however, PyFlink's requirements are incompatible with several other Ibis requirements. Beyond Ibis, PyFlink's outdated and restrictive requirements prevent it from being used alongside most recent releases of Python data libraries.

      Some of the major libraries we (and likely others in the Python community interested in using PyFlink alongside other libraries) need compatibility with:

      • PyArrow (at least >=10.0.0, but there's no reason not to be also be compatible with latest)
      • pandas (should be compatible with 2.x series, but also probably with 1.4.x, released January 2022, and 1.5.x)
      • numpy (1.22 was released in December 2022)
      • Newer releases of Apache Beam
      • Newer releases of cython

      Furthermore, uncapped dependencies could be more generally preferable, as they avoid the need for frequent PyFlink releases as newer versions of libraries are released. A common (and great) argument for not upper-bounding dependencies, especially for libraries: https://iscinumpy.dev/post/bound-version-constraints/

      I am currently testing removing upper bounds in https://github.com/apache/flink/pull/23141; so far, builds pass without issue in b65c072, and I'm currently waiting on c8eb15c to see if I can get PyArrow to resolve >=10.0.0. Solving the proposed dependencies results in:

      #
      # This file is autogenerated by pip-compile with Python 3.8
      # by the following command:
      #
      #    pip-compile --config=pyproject.toml --output-file=dev/compiled-requirements.txt dev/dev-requirements.txt
      #
      apache-beam==2.49.0
          # via -r dev/dev-requirements.txt
      avro-python3==1.10.2
          # via -r dev/dev-requirements.txt
      certifi==2023.7.22
          # via requests
      charset-normalizer==3.2.0
          # via requests
      cloudpickle==2.2.1
          # via
          #   -r dev/dev-requirements.txt
          #   apache-beam
      crcmod==1.7
          # via apache-beam
      cython==3.0.0
          # via -r dev/dev-requirements.txt
      dill==0.3.1.1
          # via apache-beam
      dnspython==2.4.1
          # via pymongo
      docopt==0.6.2
          # via hdfs
      exceptiongroup==1.1.2
          # via pytest
      fastavro==1.8.2
          # via
          #   -r dev/dev-requirements.txt
          #   apache-beam
      fasteners==0.18
          # via apache-beam
      find-libpython==0.3.0
          # via pemja
      grpcio==1.56.2
          # via
          #   -r dev/dev-requirements.txt
          #   apache-beam
          #   grpcio-tools
      grpcio-tools==1.56.2
          # via -r dev/dev-requirements.txt
      hdfs==2.7.0
          # via apache-beam
      httplib2==0.22.0
          # via
          #   -r dev/dev-requirements.txt
          #   apache-beam
      idna==3.4
          # via requests
      iniconfig==2.0.0
          # via pytest
      numpy==1.24.4
          # via
          #   -r dev/dev-requirements.txt
          #   apache-beam
          #   pandas
          #   pyarrow
      objsize==0.6.1
          # via apache-beam
      orjson==3.9.2
          # via apache-beam
      packaging==23.1
          # via pytest
      pandas==2.0.3
          # via -r dev/dev-requirements.txt
      pemja==0.3.0 ; platform_system != "Windows"
          # via -r dev/dev-requirements.txt
      pluggy==1.2.0
          # via pytest
      proto-plus==1.22.3
          # via apache-beam
      protobuf==4.23.4
          # via
          #   -r dev/dev-requirements.txt
          #   apache-beam
          #   grpcio-tools
          #   proto-plus
      py4j==0.10.9.7
          # via -r dev/dev-requirements.txt
      pyarrow==11.0.0
          # via
          #   -r dev/dev-requirements.txt
          #   apache-beam
      pydot==1.4.2
          # via apache-beam
      pymongo==4.4.1
          # via apache-beam
      pyparsing==3.1.1
          # via
          #   httplib2
          #   pydot
      pytest==7.4.0
          # via -r dev/dev-requirements.txt
      python-dateutil==2.8.2
          # via
          #   -r dev/dev-requirements.txt
          #   apache-beam
          #   pandas
      pytz==2023.3
          # via
          #   -r dev/dev-requirements.txt
          #   apache-beam
          #   pandas
      regex==2023.6.3
          # via apache-beam
      requests==2.31.0
          # via
          #   apache-beam
          #   hdfs
      six==1.16.0
          # via
          #   hdfs
          #   python-dateutil
      tomli==2.0.1
          # via pytest
      typing-extensions==4.7.1
          # via apache-beam
      tzdata==2023.3
          # via pandas
      urllib3==2.0.4
          # via requests
      wheel==0.41.0
          # via -r dev/dev-requirements.txt
      zstandard==0.21.0
          # via apache-beam
      # The following packages are considered to be unsafe in a requirements file:
      # pip
      # setuptools

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            deepyaman Deepyaman Datta
            deepyaman Deepyaman Datta
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment