Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829 Über-jira: S3A Hadoop 3.4 features
  3. HADOOP-17112

whitespace not allowed in paths when saving files to s3a via committer

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0
    • Fix Version/s: None
    • Component/s: fs/s3
    • Labels:
      None

      Description

      When saving results through spark dataframe on latest 3.0.1-snapshot compiled against hadoop-3.2 with the following specs
      --conf spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
      --conf spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
      --conf spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
      --conf spark.hadoop.fs.s3a.committer.name=partitioned
      --conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
      we are unable to save the file with whitespace character in the path. It works fine without.

      I was looking into the recent commits with regards to qualifying the path, but couldn't find anything obvious. Is this a known bug?

      When saving results through spark dataframe on latest 3.0.1-snapshot compiled against hadoop-3.2 with the following specs
      --conf spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
      --conf spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
      --conf spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
      --conf spark.hadoop.fs.s3a.committer.name=partitioned
      --conf spark.hadoop.fs.s3a.committer.staging.conflict-mode=replace
      we are unable to save the file with whitespace character in the path. It works fine without.

      I was looking into the recent commits with regards to qualifying the path, but couldn't find anything obvious. Is this a known bug?

        Attachments

        1. image-2020-07-03-16-08-52-340.png
          59 kB
          Krzysztof Adamski

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              krisss Krzysztof Adamski
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: