Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.2.0
-
None
-
None
Description
When using DistCP over HTTPFS with data that contains Spark partitions, DistCP fails to access the partitioned parquet files since the "=" characters in file path gets double encoded:
"/test/spark/partition/year=2019/month=1/day=1"
to
"/test/spark/partition/year%253D2019/month%253D1/day%253D1"
This happens since fsPathItem containing the character '=' is encoded by URLEncoder.encode(fsPathItem, "UTF-8") to '%3D' and then encoded again by new Path(....) to '%253D'.
Attachments
Attachments
Issue Links
- contains
-
HDFS-14408 HttpFS handles paths with special charactes different than WebHdfs
- Resolved
- duplicates
-
HDFS-14323 Distcp fails in Hadoop 3.x when 2.x source webhdfs url has special characters in hdfs file path
- Resolved
- relates to
-
HDFS-14323 Distcp fails in Hadoop 3.x when 2.x source webhdfs url has special characters in hdfs file path
- Resolved