Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3965

HDFS read broken in python

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.5.0
    • sdk-py-core
    • None

    Description

      When running a command like:

      python setup.py sdist > /dev/null && python -m apache_beam.examples.wordcount --output gs://.../py-wordcount-output \
        --hdfs_host ... --hdfs_port 50070 --hdfs_user ehudm --runner DataflowRunner --project ... \
        --temp_location gs://.../temp-hdfs-int --staging_location gs://.../staging-hdfs-int \
        --sdk_location dist/apache-beam-2.5.0.dev0.tar.gz --input hdfs://kinglear.txt
      

      I get:

      Traceback (most recent call last):
        File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
          "__main__", fname, loader, pkg_name)
        File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
          exec code in run_globals
        File "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/examples/wordcount.py", line 136, in <module>
          run()
        File "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/examples/wordcount.py", line 90, in run
          lines = p | 'read' >> ReadFromText(known_args.input)
        File "apache_beam/io/textio.py", line 522, in __init__
          skip_header_lines=skip_header_lines)
        File "apache_beam/io/textio.py", line 117, in __init__
          validate=validate)
        File "apache_beam/io/filebasedsource.py", line 119, in __init__
          self._validate()
        File "apache_beam/options/value_provider.py", line 124, in _f
          return fnc(self, *args, **kwargs)
        File "apache_beam/io/filebasedsource.py", line 176, in _validate
          match_result = FileSystems.match([pattern], limits=[1])[0]
        File "apache_beam/io/filesystems.py", line 159, in match
          return filesystem.match(patterns, limits)
        File "apache_beam/io/hadoopfilesystem.py", line 221, in match
          raise BeamIOError('Match operation failed', exceptions)
      apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions {'hdfs://kinglear.txt': KeyError('name',)}
      

      Attachments

        Issue Links

          Activity

            People

              udim Udi Meiri
              udim Udi Meiri
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 40m
                  3h 40m