Details
-
Bug
-
Status: Open
-
P3
-
Resolution: Unresolved
-
2.29.0, 2.30.0, 2.31.0, 2.32.0, 2.33.0
-
None
-
We're using python linux docker images, such as `python:bullseye`, and building an image that installs packages from a `requirements.txt` file with a beam requirement such as `apache-beam ~= 2.28.0`
Description
The below code throws this type error on the effected versions, and merely works as expected on 2.28.0:
`TypeError: Unable to deterministically encode '2021-11-02' of type '<class 'datetime.date'>', please provide a type hint for the input of 'GroupByKey' [while running 'Create/Map(decode)']`
import typing from datetime import date import apache_beam as beam from apache_beam.testing.test_pipeline import TestPipeline with TestPipeline() as pipeline: today = date.today() results = ( pipeline | beam.Create([(1, { 'd': today }), (1, { 'd': today })]) | beam.MapTuple(lambda i, d: (d['d'], i)) # <-- this step only requires output type hints on versions after 2.28.0, and only if the date is being "projected" from some other data structure | beam.CombinePerKey(sum) # <-- if this aggregation is removed, the pipeline also works without error ) results | beam.Map(print)
This stackoverflow issue is having the same problem:
https://stackoverflow.com/questions/69409693/how-do-i-use-a-datetime-date-value-in-apache-beam-groupby
It's possible to fix the errors by registering a `DateCoder` and adding output type hints to the projection `MapTuple` step, but since this isn't necessary in other situations and versions, it seems this is a bug. Our production pipelines will need to add many of these tedious type hints in order to work properly, so we're effectively blocked from upgrading to the newest version.