Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7885

DoFn.setup() don't run for streaming jobs on DirectRunner.

Details

    • Bug
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • 2.14.0
    • 2.22.0
    • sdk-py-core
    • None
    • Python

    Description

      From version 2.14.0 Python have introduced setup and teardown for DoFn in order to "Called to prepare an instance for processing bundles of elements.This is a good place to initialize transient in-memory resources, such as network connections."

      However when trying to use it for a unbounded job (pubsub source) it seams like the DoFn.setup() is never called and the resources are never initialize. [UPDATE] it is working for Dataflow runner but not for DirectRunner. For the Dataflow runner the DoFn.Setup seams to be called multiple times but then never again when the pipeline is processing elements [UPDATE] . For the direct runner I get:

      """" 

      AttributeError: 'NoneType' object has no attribute 'predict' [while running 'transform the data']

      """

      My source code: https://github.com/NikeNano/DataflowSklearnStreaming

       

      I am happy to contribute with example code for how to use setup as soon as I get it running   

       

      Attachments

        Activity

          People

            pabloem Pablo Estrada
            nikenano niklas Hansson
            Votes:
            4 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: