Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.19.0
    • None
    • runner-dataflow
    • None

    Description

      Hi,

      I'm trying to use python portable DataflowRunner with a [BagStateSpec|https://beam.apache.org/releases/pydoc/2.6.0/apache_beam.transforms.userstate.html]. Though I encounter followiung issue:

      Traceback (most recent call last):
        File "/Users/leopold/.pyenv/versions/3.6.0/lib/python3.6/runpy.py", line 193, in _run_module_as_main
          "__main__", mod_spec)
        File "/Users/leopold/.pyenv/versions/3.6.0/lib/python3.6/runpy.py", line 85, in _run_code
          exec(code, run_globals)
        File "/Users/leopold/workspace/BenchmarkListingStreaming/listing_beam_pipeline/test_runner.py", line 49, in <module>
          run()
        File "/Users/leopold/workspace/BenchmarkListingStreaming/listing_beam_pipeline/test_runner.py", line 44, in run
          | 'write to file' >> WriteToText(known_args.output)
        File "/Users/leopold/.pyenv/versions/BenchmarkListingStreaming/lib/python3.6/site-packages/apache_beam/pipeline.py", line 481, in __exit__
          self.run().wait_until_finish()
        File "/Users/leopold/.pyenv/versions/BenchmarkListingStreaming/lib/python3.6/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1449, in wait_until_finish
          (self.state, getattr(self._runner, 'last_error_msg', None)), self)
      apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
      Traceback (most recent call last):
        File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 648, in do_work
          work_executor.execute()
        File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py", line 176, in execute
          op.start()
        File "apache_beam/runners/worker/operations.py", line 649, in apache_beam.runners.worker.operations.DoOperation.start
        File "apache_beam/runners/worker/operations.py", line 651, in apache_beam.runners.worker.operations.DoOperation.start
        File "apache_beam/runners/worker/operations.py", line 652, in apache_beam.runners.worker.operations.DoOperation.start
        File "apache_beam/runners/worker/operations.py", line 261, in apache_beam.runners.worker.operations.Operation.start
        File "apache_beam/runners/worker/operations.py", line 266, in apache_beam.runners.worker.operations.Operation.start
        File "apache_beam/runners/worker/operations.py", line 597, in apache_beam.runners.worker.operations.DoOperation.setup
        File "apache_beam/runners/worker/operations.py", line 636, in apache_beam.runners.worker.operations.DoOperation.setup
        File "apache_beam/runners/common.py", line 866, in apache_beam.runners.common.DoFnRunner.__init__
      Exception: Requested execution of a stateful DoFn, but no user state context is available. This likely means that the current runner does not support the execution of stateful DoFns.
      

      I've also seen this issue in stackoverflow

      https://stackoverflow.com/questions/55413690/does-google-dataflow-support-stateful-pipelines-developed-with-python-sdk

       

      Do you have any idea/ETA when this feature will be available with beam?

       

      Thanks!

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            leopold.boudard Léopold Boudard
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated: