Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13217

TypeCheckError due to CoGroupByKey output mis-deduction

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.32.0, 2.33.0, 2.34.0, 2.35.0
    • None
    • sdk-py-core
    • None

    Description

      After upgrading our Python project from 2.31.0 to 2.33.0, we started getting TypeCheckErrors such as

      apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'all_data/combine_new_and_all': requires Tuple[Tuple[Any, Any], Dict[str, Iterable[_CombinedEntry]]] but got Tuple[Tuple[int, int], Dict[str, List[Union[]]]] for element

      where the output value of a CoGroupByKey() is apparently incorrectly deduced to be a Dict[str, List[Union[]]].

      I managed to build a small repro case:

      import apache_beam as beam
      from typing import Dict, Iterable, Tuple
      
      {
          "foo": [(42, "foo")],
          "bar": [(42, "bar")],
      } | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, Iterable[str]]])
      

      which raises

      apache_beam.typehints.decorators.TypeCheckError: Output type hint violation at CoGroupByKey: expected Tuple[int, Dict[str, Iterable[str]]], got Tuple[int, Dict[str, List[Union[]]]]

      or alternatively, using a TestPipeline:

      import apache_beam as beam
      from apache_beam.testing.test_pipeline import TestPipeline
      from apache_beam.testing.util import assert_that, equal_to
      from typing import Dict, Iterable, Tuple
      
      with TestPipeline() as p:
          actual = {
              "foo": p | "create_foo" >> beam.Create([(42, "foo")]),
              "bar": p | "create_bar" >> beam.Create([(42, "bar")]),
          } | beam.CoGroupByKey().with_output_types(Tuple[int, Dict[str, Iterable[str]]])
          assert_that(actual, equal_to([(42, {"foo": ["foo"], "bar": ["bar"]})]))
      

      Oh, and one more thing, about that Tuple[Any, Any] from the original error message I posted. We can reproduce that like this:

      import apache_beam as beam
      from typing import Dict, Iterable, NewType, Tuple
      
      key = NewType("key", int)
      {
          "foo": [(key(1337), "foo")],
          "bar": [(key(1337), "bar")],
      } | beam.CoGroupByKey().with_output_types(Tuple[key, Dict[str, Iterable[str]]])
      

      apache_beam.typehints.decorators.TypeCheckError: Output type hint violation at CoGroupByKey: expected Tuple[Any, Dict[str, Iterable[str]]], got Tuple[int, Dict[str, List[Union[]]]]

      It looks like NewType is treated as Any? That surprised me.

      I could also reproduce the issue in 2.32.0.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mrwonko Willi Schinmeyer
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: