Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11790

Error when trying to read from S3 with Python SDK and external runners

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P2
    • Resolution: Fixed
    • Affects Version/s: 2.27.0
    • Fix Version/s: 2.28.0
    • Component/s: io-py-aws
    • Labels:
      None

      Description

      Using external environment with Python SDK on Flink and File IO that reads from S3 causes the following error to be thrown:

      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 289, in _execute'
      INFO:apache_beam.utils.subprocess_server:b'    response = task()'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 362, in <lambda>'
      INFO:apache_beam.utils.subprocess_server:b'    lambda: self.create_worker().do_instruction(request), request)'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 606, in do_instruction'
      INFO:apache_beam.utils.subprocess_server:b'    return getattr(self, request_type)('
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 644, in process_bundle'
      INFO:apache_beam.utils.subprocess_server:b'    bundle_processor.process_bundle(instruction_id))'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/bundle_processor.py", line 999, in process_bundle'
      INFO:apache_beam.utils.subprocess_server:b'    input_op_by_transform_id[element.transform_id].process_encoded('
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/bundle_processor.py", line 228, in process_encoded'
      INFO:apache_beam.utils.subprocess_server:b'    self.output(decoded_value)'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/worker/operations.py", line 357, in apache_beam.runners.worker.operations.Operation.output'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/worker/operations.py", line 359, in apache_beam.runners.worker.operations.Operation.output'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/worker/operations.py", line 221, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/worker/operations.py", line 718, in apache_beam.runners.worker.operations.DoOperation.process'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/worker/operations.py", line 719, in apache_beam.runners.worker.operations.DoOperation.process'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner.process'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/common.py", line 1321, in apache_beam.runners.common.DoFnRunner._reraise_augmented'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/future/utils/__init__.py", line 446, in raise_with_traceback'
      INFO:apache_beam.utils.subprocess_server:b'    raise exc.with_traceback(traceback)'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/common.py", line 768, in apache_beam.runners.common.PerWindowInvoker.invoke_process'
      INFO:apache_beam.utils.subprocess_server:b'  File "apache_beam/runners/common.py", line 893, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/iobase.py", line 1131, in process'
      INFO:apache_beam.utils.subprocess_server:b'    self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/options/value_provider.py", line 200, in _f'
      INFO:apache_beam.utils.subprocess_server:b'    return fnc(self, *args, **kwargs)'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filebasedsink.py", line 196, in open_writer'
      INFO:apache_beam.utils.subprocess_server:b'    return FileBasedSinkWriter(self, writer_path)'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filebasedsink.py", line 417, in __init__'
      INFO:apache_beam.utils.subprocess_server:b'    self.temp_handle = self.sink.open(temp_shard_path)'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/textio.py", line 405, in open'
      INFO:apache_beam.utils.subprocess_server:b'    file_handle = super(_TextSink, self).open(temp_path)'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/options/value_provider.py", line 200, in _f'
      INFO:apache_beam.utils.subprocess_server:b'    return fnc(self, *args, **kwargs)'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filebasedsink.py", line 138, in open'
      INFO:apache_beam.utils.subprocess_server:b'    return FileSystems.create(temp_path, self.mime_type, self.compression_type)'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/filesystems.py", line 229, in create'
      INFO:apache_beam.utils.subprocess_server:b'    return filesystem.create(path, mime_type, compression_type)'
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/aws/s3filesystem.py", line 171, in create'
      INFO:apache_beam.utils.subprocess_server:b"    return self._path_open(path, 'wb', mime_type, compression_type)"
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/aws/s3filesystem.py", line 151, in _path_open'
      INFO:apache_beam.utils.subprocess_server:b'    raw_file = s3io.S3IO(options=self._options).open('
      INFO:apache_beam.utils.subprocess_server:b'  File "/usr/local/lib/python3.8/site-packages/apache_beam/io/aws/s3io.py", line 63, in __init__'
      INFO:apache_beam.utils.subprocess_server:b"    raise ValueError('Must provide one of client or options')"
      INFO:apache_beam.utils.subprocess_server:b"ValueError: Must provide one of client or options [while running 'Write/Write/WriteImpl/WriteBundles']"
      

      I assume this is somehow related to a recent PR that added this assertion that is now failing.

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              nirga Nir Gazit

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 10m
                3h 10m

                  Issue deployment