Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
Description
The process method of DoFns can either return values or yield values. In the case of returning values, it expects a List of elements to be returned. When returning a single value, it is easy to forget this, and return the value instead.
Correct way:
class SomeDoFn(beam.DoFn)
def process(self, elem):
return ['a']
Incorrect way:
class SomeDoFn(beam.DoFn)
def process(self, elem):
return 'a'
A pipeline with the incorrect DoFn will fail will a cryptic error message without a direct indication that the actual error is due to SomeDoFn returning an element instead of a List containing that element. This issue is very time-consuming to track down.
It would be good if the pipeline could raise an exception or otherwise indicate that the DoFn is incorrectly returning an element instead of a List to make it easier to identify the error.