Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-2429

Conflicting filesystems with used of HadoopFileSystem

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Not A Bug
    • 2.0.0
    • 2.0.0
    • None

    Description

      I'm facing issue when trying to use HadoopFileSystem in my pipeline. It looks like HadoopFileSystem is registring itself under the `file` schema (https://github.com/apache/beam/pull/2777/files#diff-330bd0854dcab6037ef0e52c05d68eb2L79), hence the following Exception is thrown when trying to register HadoopFileSystem.

      java.lang.IllegalStateException: Scheme: [file] has conflicting filesystems: [org.apache.beam.sdk.io.LocalFileSystem, org.apache.beam.sdk.io.hdfs.HadoopFileSystem]
      at org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:498)

      What is the correct way to handle `hdfs` url out of the box with TextIO & AvroIO ?

          String[] args = new String[]{
              "--hdfsConfiguration=[{\"dfs.client.use.datanode.hostname\": \"true\"}]"};
          HadoopFileSystemOptions options = PipelineOptionsFactory
              .fromArgs(args)
              .withValidation()
              .as(HadoopFileSystemOptions.class);
          Pipeline pipeline = Pipeline.create(options); 
      

      Attachments

        Issue Links

          Activity

            People

              flaviocf Flavio Fiszman
              Geronimo François Wagner
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: