Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10705

Passing whl files in --sdk_location does not work for https locations.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: sdk-py-core
    • Labels:

      Description

      Sample repro:

      python -m apache_beam.examples.wordcount --input=gs://dataflow-samples/shakespeare/kinglear.txt --output /tmp/wordcount --runner=DataflowRunner --project=google.com:clo
      uddfe --temp_location gs://clouddfe-valentyn/tmp/ --region=us-central1 --sdk_location=https://
      storage.googleapis.com/beam-wheels-staging/master/94f9e7fd4cae0f8aa6587d2cf14887f1c4827485-198
      203585/apache_beam-2.24.0.dev0-cp27-cp27m-macosx_10_9_x86_64.whl

        File "/home/valentyn/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
          "__main__", mod_spec)
        File "/home/valentyn/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 85, in _run_code
          exec(code, run_globals)
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/examples/wordcount.py", line 99, in <module>
          run()
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/examples/wordcount.py", line 94, in run
          output | 'Write' >> WriteToText(known_args.output)
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/pipeline.py", line 555, in __exit__
          self.result = self.run()
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/pipeline.py", line 521, in run
          allow_proto_holders=True).run(False)
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/pipeline.py", line 534, in run
          return self.runner.run_pipeline(self, self._options)
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 479, in run_pipeline
          artifacts=environments.python_sdk_dependencies(options)))
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/transforms/environments.py", line 613, in python_sdk_dependencies
          staged_name in stager.Stager.create_job_resources(options, tmp_dir))
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/runners/portability/stager.py", line 235, in create_job_resources
          resources.extend(Stager._create_beam_sdk(sdk_remote_location, temp_dir))
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/runners/portability/stager.py", line 659, in _create_beam_sdk
          sdk_remote_location)
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/runners/portability/stager.py", line 596, in _desired_sdk_filename_in_staging_location
          _, wheel_filename = FileSystems.split(sdk_location)
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/io/filesystems.py", line 151, in split
          filesystem = FileSystems.get_filesystem(path)
        File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/io/filesystems.py", line 106, in get_filesystem
          'e.g., pip install apache-beam[gcp]. Path specified: %s' % path)
      ValueError: Unable to get filesystem from specified path, please use the correct path or ensure the required dependency is installed, e.g., pip install apache-beam[gcp]. Path specified: https://storage.googleapis.com/beam-wheels-staging/master/94f9e7fd4cae0f8aa6587d2cf14887f1c4827485-198203585/apache_beam-2.24.0.dev0-cp27-cp27m-macosx_10_9_x86_64.whl
      
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                aennassiri Ayoub Ennassiri
                Reporter:
                tvalentyn Valentyn Tymofieiev
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4.5h
                  4.5h