Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features
  3. HADOOP-17112

whitespace not allowed in paths when saving files to s3a via committer

    XMLWordPrintableJSON

Details

    Description

      When saving results through spark dataframe on latest 3.0.1-snapshot compiled against hadoop-3.2 with the following specs
      --conf spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
      --conf spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
      --conf spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
      --conf spark.hadoop.fs.s3a.committer.name=partitioned
      --conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
      we are unable to save the file with whitespace character in the path. It works fine without.

      I was looking into the recent commits with regards to qualifying the path, but couldn't find anything obvious. Is this a known bug?

      When saving results through spark dataframe on latest 3.0.1-snapshot compiled against hadoop-3.2 with the following specs
      --conf spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
      --conf spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
      --conf spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
      --conf spark.hadoop.fs.s3a.committer.name=partitioned
      --conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
      we are unable to save the file with whitespace character in the path. It works fine without.

      I was looking into the recent commits with regards to qualifying the path, but couldn't find anything obvious. Is this a known bug?

      Attachments

        1. image-2020-07-03-16-08-52-340.png
          59 kB
          Krzysztof Adamski

        Issue Links

          Activity

            People

              krisss Krzysztof Adamski
              krisss Krzysztof Adamski
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m