Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-14235

parquetio module does not parse PEP-440 compliant Pyarrow version

Details

    • Bug
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • 2.27.0
    • 2.39.0
    • io-py-parquet
    • None

    Description

      In version > 2.27, introduced by this PR: https://github.com/apache/beam/pull/13302/files#diff-33b0b6b112036df96f341aa83b88efba9215ec14dfabc9db9e9ffe66a23154a2R55

      The parquetio module parses the pyarrow version like this:

      ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.')) 

      (see https://github.com/apache/beam/blob/v2.27.0/sdks/python/apache_beam/io/parquetio.py#L55)

       

      This does not support all PEP-440 compliant versions: https://peps.python.org/pep-0440/

       

      For example, if pyarrow were to have a version like this: 1.0.0+abc.7, then this module would fail:

      Traceback (most recent call last):
        File "/usr/local/lib/python3.7/runpy.py", line 183, in _run_module_as_main
          mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
        File "/usr/local/lib/python3.7/runpy.py", line 109, in _get_module_details
          __import__(pkg_name)
        File "/usr/local/lib/python3.7/site-packages/apache_beam/__init__.py", line 93, in <module>
          from apache_beam import io
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/__init__.py", line 28, in <module>
          from apache_beam.io.parquetio import *
        File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 53, in <module>
          ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.'))
      ValueError: invalid literal for int() with base 10: '0+abc.7'

       

      In practice, this would fail when somebody forks pyarrow, like yours truly.

       

      We can fix this by using pkg_resourses.parse_version which is PEP-440 compliant starting setuptools 6.0. 

       

      If maintainers agree with this change I would be wiling to submit a PR.

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cozos Arwin S Tio
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m

                  Slack

                    Issue deployment