-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Duplicate
-
Affects Version/s: 3.2.0
-
Fix Version/s: None
-
Component/s: hdfs, hdfs-client
-
Labels:None
When using DistCP over HTTPFS with data that contains Spark partitions, DistCP fails to access the partitioned parquet files since the "=" characters in file path gets double encoded:
"/test/spark/partition/year=2019/month=1/day=1"
to
"/test/spark/partition/year%253D2019/month%253D1/day%253D1"
This happens since fsPathItem containing the character '=' is encoded by URLEncoder.encode(fsPathItem, "UTF-8") to '%3D' and then encoded again by new Path(....) to '%253D'.
- contains
-
HDFS-14408 HttpFS handles paths with special charactes different than WebHdfs
-
- Resolved
-
- duplicates
-
HDFS-14323 Distcp fails in Hadoop 3.x when 2.x source webhdfs url has special characters in hdfs file path
-
- Resolved
-
- relates to
-
HDFS-14323 Distcp fails in Hadoop 3.x when 2.x source webhdfs url has special characters in hdfs file path
-
- Resolved
-