Details
-
Bug
-
Status: Resolved
-
P3
-
Resolution: Fixed
-
2.14.0
-
None
-
Python
Description
From version 2.14.0 Python have introduced setup and teardown for DoFn in order to "Called to prepare an instance for processing bundles of elements.This is a good place to initialize transient in-memory resources, such as network connections."
However when trying to use it for a unbounded job (pubsub source) it seams like the DoFn.setup() is never called and the resources are never initialize. [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the Dataflow runner the DoFn.Setup seams to be called multiple times but then never again when the pipeline is processing elements [UPDATE] . For the direct runner I get:
""""
AttributeError: 'NoneType' object has no attribute 'predict' [while running 'transform the data']
"""
My source code: https://github.com/NikeNano/DataflowSklearnStreaming
I am happy to contribute with example code for how to use setup as soon as I get it running