Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7871

Streaming from PubSub to Firestore doesn't work on Dataflow

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.13.0
    • None
    • io-py-gcp, runner-dataflow
    • None

    Description

      I came to the same error as here https://stackoverflow.com/questions/57059944/python-package-errors-while-running-gcp-dataflow but I don't see anywhere reported thus I am creating an issue just in case.

      The pipeline is quite simple, reading from PubSub and writing to Firestore.

      Beam version used is 2.13.0, Python 2.7

      With DirectRunner works ok, but on Dataflow it throws the following message:

       

      java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -81: Traceback (most recent call last):
       File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py", line 157, in _execute
       response = task()
       File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py", line 190, in <lambda>
       self._execute(lambda: worker.do_instruction(work), work)
       File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py", line 312, in do_instruction
       request.instruction_id)
       File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py", line 331, in process_bundle
       bundle_processor.process_bundle(instruction_id))
       File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py", line 538, in process_bundle
       op.start()
       File "apache_beam/runners/worker/operations.py", line 554, in apache_beam.runners.worker.operations.DoOperation.start
       def start(self):
       File "apache_beam/runners/worker/operations.py", line 555, in apache_beam.runners.worker.operations.DoOperation.start
       with self.scoped_start_state:
       File "apache_beam/runners/worker/operations.py", line 557, in apache_beam.runners.worker.operations.DoOperation.start
       self.dofn_runner.start()
       File "apache_beam/runners/common.py", line 778, in apache_beam.runners.common.DoFnRunner.start
       self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle)
       File "apache_beam/runners/common.py", line 775, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
       self._reraise_augmented(exn)
       File "apache_beam/runners/common.py", line 800, in apache_beam.runners.common.DoFnRunner._reraise_augmented
       raise_with_traceback(new_exn)
       File "apache_beam/runners/common.py", line 773, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
       bundle_method()
       File "apache_beam/runners/common.py", line 359, in apache_beam.runners.common.DoFnInvoker.invoke_start_bundle
       def invoke_start_bundle(self):
       File "apache_beam/runners/common.py", line 363, in apache_beam.runners.common.DoFnInvoker.invoke_start_bundle
       self.signature.start_bundle_method.method_value())
       File "/home/zdenulo/dev/gcp_stuff/df_firestore_stream/df_firestore_stream.py", line 39, in start_bundle
      NameError: global name 'firestore' is not defined [while running 'generatedPtransform-64']
       
      

      It's interesting that using Beam version 2.12.0 solves the problem on Dataflow, it works as expected, not sure what could be the problem.

      Here is a repository with the code which was used https://github.com/zdenulo/dataflow_firestore_stream

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            zdenulo Zdenko Hrcek
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: