Details

    • Improvement
    • Status: Triage Needed
    • P2
    • Resolution: Fixed
    • None
    • 2.11.0
    • sdk-py-core
    • None

    Description

      FAQ

      Does Apache Beam support Python 3?

      • Yes!

      Is there any remaining work?

      Which SDK version should I use?

      • For best experience, use the latest released SDK. For summary of Py3-related changes, read this thread.

      Help! I am getting a pickling error in StockUnpickler.find_class() on Python 3.

      • Does the error happens in load_session call? See BEAM-6158 .
      • Do you use Beam SDK less than 2.17.0? See BEAM-8651.

      My Avro Sink no longer works.

      • Beam switched to use FastAvro as a default library on Python 3. The fastavro-based Avro sink expects schema as a dictionary, while the avro-python3-based Avro Sink expects a schema that was previously parsed by avro.schema.Parse(). Fastavro will not accept a schema parsed by avro-python3, so make sure you pass the correct schema. See: BEAM-10769.

      My streaming pipelines are stuck on Python 3.

      • Do you use Beam SDK less than 2.17.0? If so please upgrade to 2.17.0. See BEAM-8651.

      Attachments

        Issue Links

          1.
          Support Python native types in Beam typehints Sub-task Resolved Udi Meiri

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          2.
          Make the coders package compatible with Python 3 Sub-task Resolved Luke Zhu  
          3.
          Enable tests to run in Python 3 Sub-task Resolved Luke Zhu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4.5h
          4.
          Finish io futurize stage 2: fix the missing pylint3 check in tox.ini Sub-task Resolved Matthias Feys

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2.5h
          5.
          Create a tox environment that uses Py3 interpreter for pre/post commit test suites, once codebase supports Py3. Sub-task Resolved Matthias Feys

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          6.
          Add an SDK harness container with Python 3 interpreter for portable pipelines. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          7.
          Exercise Python 3 SDK harness container in ValidatesContainer Jenkins test suite. Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 10m
          8.
          Finish Python 3 porting for coders module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h
          9.
          Finish Python 3 porting for examples module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2.5h
          10.
          Finish Python 3 porting for internal module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          11.
          Finish Python 3 porting for io module Sub-task Resolved Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 22h 10m
          12.
          Finish Python 3 porting for metrics module Sub-task Resolved Robbe  
          13.
          Finish Python 3 porting for options module Sub-task Resolved Manu Zhang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 40m
          14.
          Finish Python 3 porting for portability module Sub-task Resolved Robbe  
          15.
          Finish Python 3 porting for runners module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 20m
          16.
          Finish Python 3 porting for testing module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 50m
          17.
          Finish Python 3 porting for transforms module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h
          18.
          Finish Python 3 porting for typehints module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          19.
          Finish Python 3 porting for utils module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          20.
          Finish Python 3 porting for unpackaged files Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 10m
          21.
          Add tox suites to exercise unit tests using Python3 interpreter with cython, and with gcp dependencies. Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 50m
          22.
          Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword argument for this function Sub-task Resolved Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3.5h
          23.
          Several tests fail on Python 3 with Failed assert: [<some number>] == [nan] Sub-task Resolved Robbe  
          24.
          Side inputs don't work on Python 3 Sub-task Resolved Robert Bradshaw

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          25.
          Several tests fail on Python 3 with: unsupported operand type(s) for +: 'int' and 'EmptySideInput' Sub-task Resolved Unassigned  
          26.
          Some tests use assertItemsEqual method, not available in Python 3 Sub-task Resolved Matthias Feys

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          27.
          Several tests fail on Python 3 with TypeError: unorderable types: str() < int() Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 20m
          28.
          Several tests fail on Python 3 with: Runtime type violation detected Sub-task Resolved Unassigned  
          29.
          Several IO tests hang indefinitely during execution on Python 3. Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          30.
          Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse') Sub-task Resolved Simon

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          31.
          Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)} Sub-task Resolved Ruoyun Huang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h
          32.
          Redesign test_split_at_fraction_exhaustive tests for Python 3 Sub-task Open Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4.5h
          33.
          VcfIO is not Python3-compatible and there are no plans to make it compatible. Sub-task Resolved Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 10m
          34.
          Several typehints tests fail on Python 3 with ValueError: no signature found for builtin <method 'upper' of 'str' objects> Sub-task Resolved Robbe  
          35.
          Add tox suites for various Python 3 versions (3.5, 3.6, 3.7) Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 50m
          36.
          Default coder breaks with large ints on Python 3 Sub-task Resolved Robert Bradshaw

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          37.
          Disable compare parameter in Top.Of() combiner when executing in Python 3. Sub-task Resolved Robert Bradshaw

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          38.
          Util test on annotations fails Sub-task Resolved Ruoyun Huang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 10m
          39.
          Using methods in map is broken on Python 3 Sub-task Resolved Unassigned  
          40.
          Validates runner tests fail with: Cannot convert bytes value to JSON value Sub-task Resolved Mark Liu  
          41.
          wordcount_fnapi_it failed on TestDataflowRunner because of JSON string decoding error Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 40m
          42.
          Support DoFns with Keyword-only arguments in Python 3. Sub-task Triage Needed Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 17h 10m
          43.
          TFRecordio not Py3 compatible Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 50m
          44.
          Enable WordCount example on DataflowRunner on Python 3 Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 40m
          45.
          Gradle setupVirtualenv supports Python 3 Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 50m
          46.
          Revert dill pip install from github commit Sub-task Resolved Valentyn Tymofieiev  
          47.
          Gcsio batch delete broken in Python 3 Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          48.
          Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' . Sub-task Triage Needed Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 10m
          49.
          Opcounters sampling test fails for some random seeds on Python3 Sub-task Resolved Robbe  
          50.
          TypeError in DataflowRunner: dict_values does not support indexing Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          51.
          Dill fails to pickle avro.RecordSchema classes on Python 3. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 11.5h
          52.
          Parallel tox (unit) tests run on Jenkins Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 12h 10m
          53.
          BigQuery IO does not work in Python 3 Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          54.
          TypeHints Py3 Error: TrivialInferenceTest.testTupleListComprehension fails on Python 3 Sub-task Resolved Udi Meiri  
          55.
          GCS IO tests are very flaky under Python 3.5 Sub-task Resolved Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 50m
          56.
          Dataflow Python runner should use a Python-3 compatible container when starting a Python 3 pipeline. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          57.
          Add integration test on DirectRunner in Python 3 Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          58.
          Beam Python SDK release qualification should verify supported Python 3 versions. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 20m
          59.
          Stager should stage Python 3 wheels for Beam SDK once they are released. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          60.
          Release Python 3 wheels with first Beam SDK release that supports Python 3. Sub-task Resolved Robert Bradshaw  
          61.
          Add PostCommit suite for integration tests on DataflowRunner Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 26h 20m
          62.
          Exercise Dataflow runner integration tests in a postcommit suite for Python 3.5 and 3.6 Sub-task Resolved Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 10m
          63.
          Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3. Sub-task Resolved Frederik Bode

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 14h 40m
          64.
          Exercise direct runner integration tests in a postcommit suite for Python 3.5 and 3.6. Sub-task Resolved Juta Staes  
          65.
          SDK source tarball is different when created on Python 2 and Python 3 Sub-task Resolved Valentyn Tymofieiev  
          66.
          Typehinting depends on typing changes in Python 3.5.3 Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          67.
          Bigquery Tornadoes IT is broken in Python3 PostCommit test suite. Sub-task Resolved Pablo Estrada

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 50m
          68.
          Block size difference in avro library on Python3 causes some AvroIO tests to fail. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 20m
          69.
          BigQuery IO does not support bytes in Python 3 Sub-task Triage Needed Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20h 50m
          70.
          Add Streaming wordcount test to Dataflow ValidatesContainer test suite Sub-task Resolved Unassigned  
          71.
          python 3 test_hourly_team_score_it fails with bigquery job id already exists Sub-task Resolved Unassigned  
          72.
          test_multimap_side_input in fn_api.runner_test fails on Python 3.6 Sub-task Resolved Robbe  
          73.
          Add Python3 performance benchmarks Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 16h 10m
          74.
          Configurable Python interpreter version in Gradle Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          75.
          Design Py3-compatible typehints annotation support in Beam 3. Sub-task Triage Needed Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 18.5h
          76.
          Enable use_fastavro experiment on Dataflow Runner for all Py3 jobs. Sub-task Resolved Frederik Bode  
          77.
          Add DirectRunnerIT test suite to Python3 Postcommit suite. Sub-task Resolved Juta Staes  
          78.
          TypeError caused by using str variable as header argument in apache_beam.io.textio.WriteToText Sub-task Resolved yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          79.
          Rename ToStringCoder into ToBytesCoder Sub-task Resolved yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 40m
          80.
          Dataflow runner should set use_fastavro experiment on Python 3. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 40m
          81.
          Add ValidatesRunner test suite for Flink on Python 3. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 40m
          82.
          Enable Python3 tests for Spark Sub-task Resolved Kyle Weaver  
          83.
          Support Py3 Dataclasses Sub-task Resolved yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          84.
          Revise BQ integration tests to clearly communicate that BQ IO expects base64-encoded bytes.  Sub-task Resolved Juta Staes  
          85.
          apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive is very slow Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          86.
          Clean up Python 2 codepaths once Beam no longer supports Python 2. Sub-task Triage Needed yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 79h 10m
          87.
          FastAvroTest has slow test_dynamic_exhaustive on Python 2 and 3. Sub-task Resolved Unassigned  
          88.
          Create a Wordcount-on-Flink Python 3 test suite. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 10m
          89.
          Document Python 3 support in Beam starting from 2.14.0 Sub-task Resolved Rose Nguyen

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 10m
          90.
          Add Python 3.6, 3.7 as supported qualifiers to setup.py. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          91.
          Improve Avro IO integration test coverage on Python 3. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 10m
          92.
          Add smoke integration tests to Precommit test suites on Python 3 Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          93.
          Add SDK harness containers for Py 3.6, Py 3.7 Sub-task Resolved Hannah Jiang  
          94.
          deadlock using save_main_session and logging caused by threading.RLock pickling Sub-task Open Unassigned  
          95.
          Add integration tests for HDFS Sub-task Resolved Frederik Bode

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 10m
          96.
          Add ITs to check IO behavior with bytes and unicode strings Sub-task Resolved Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 50m
          97.
          Accept Py3 wheels in SDK harness container. Sub-task Resolved yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h
          98.
          Unify test suite configuration structure across Py2 and Py 3 suites Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h
          99.
          Python 3 test parallelization causes test flakines due to ModuleNotFoundError. Sub-task Resolved Mark Liu  
          100.
          Implement support of PEP 484 annotations for user functions in transforms such as ParDo, Combine in Py3. Sub-task Resolved Udi Meiri  
          101.
          Migrate to "typing" module typing types in Beam typehints (on Py2 and Py3). Sub-task Resolved Udi Meiri

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          102.
          Allow retries of PostCommit test suites per Python version Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 40m
          103.
          Use a Python3-compatible profiler in apache_beam.utils.profiler Sub-task Resolved yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 50m
          104.
          Add key type conversion in from and to client entity in Datastore v1new IO. Sub-task Resolved Udi Meiri

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          105.
          Add Python 2 deprecation warnings starting from 2.17.0 release. Sub-task Resolved Valentyn Tymofieiev  
          106.
          Generate Python SDK docs using Python 3 Sub-task Resolved yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 9h
          107.
          UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow runner Sub-task Resolved Unassigned  
          108.
          Run pylint in Python 3 Sub-task Resolved Chad Dombrova  
          109.
          Add a Python 3 test scenario for MongoDB IO Sub-task Resolved Yichi Zhang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 40m
          110.
          --profile_memory flag is py2 only Sub-task Resolved Valentyn Tymofieiev  
          111.
          Provide a way to better control minor+patch versions of Python 3.x interpreters used to run Beam tests locally and on Jenkins. Sub-task Open Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 16h 50m
          112.
          Update documentation for Python 3 support after Beam 2.16.0. Sub-task Resolved Cyrus Maden

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 10m
          113.
          Establish consensus around how many concurrent minor versions of Python Beam should support, and deprecation policy for older versions. Sub-task Resolved Unassigned  
          114.
          Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class() Sub-task Triage Needed Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 10m
          115.
          Drop support for Python 3.5 Sub-task Triage Needed Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 8h
          116.
          Create a page that describes what it takes to add and sunset support of a Python minor version in Beam. Sub-task Open Unassigned  
          117.
          Mark 2.24.0 as the last release supporting Python 2 in release notes and warnings. Sub-task Triage Needed Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          118.
          python3.9.1 support Sub-task Resolved Unassigned  

          Activity

            People

              tvalentyn Valentyn Tymofieiev
              eyad.alsibai@gmail.com Eyad Sibai
              Votes:
              39 Vote for this issue
              Watchers:
              46 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 516h 40m
                  516h 40m