Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
2.4.3, 3.2.0
-
None
-
None
Description
In the distributed hdfs storage system,Space and other special character are allowed in the path:
hdfs://ns1/tmp2/hive-staging/hadoop_hive_2020-07-06_17-31-29_139_7042265710400397740-1/-ext-10000/test_table=2020-06-17 18%3A00%3A00/part-00000-84396c4e-ba05-4936-afc7-db46c4251bfa.c000
When we load data by using
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat org.apache.spark.sql.execution.datasources.orcOrcFileFormat.scala org.apache.spark.sql.hive.orc.OrcFileFormat
, exception may throw as below:
Caused by: java.net.URISyntaxException: Illegal character in path at index 136: hdfs://ns1/tmp2/hive-staging/hadoop_hive_2020-07-06_17-31-29_139_7042265710400397740-1/-ext-10000/test_table=2020-06-17 18%3A00%3A00/part-00000-84396c4e-ba05-4936-afc7-db46c4251bfa.c000
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parseHierarchical(URI.java:3105)
at java.net.URI$Parser.parse(URI.java:3053)
at java.net.URI.<init>(URI.java:588)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:356)atorg.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
anonfunbuildReaderWithPartitionValues1.apply(ParquetFileFormat.scala:352)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.orgapachesparksqlexecutiondatasourcesFileScanRDD
anon
readCurrentFile(FileScanRDD.scala:124)
at org.apache.spark.sql.execution.datasources.FileScanRDD
anon$1.nextIterator(FileScanRDD.scala:177)atorg.apache.spark.sql.execution.datasources.FileScanRDD
anon1.hasNext(FileScanRDD.scala:101)atorg.apache.spark.sql.execution.datasources.FileFormatWriteranonfunorgapachesparksqlexecutiondatasourcesFileFormatWriter
executeTask$3.apply(FileFormatWriter.scala:252)atorg.apache.spark.sql.execution.datasources.FileFormatWriter
anonfunorgapachesparksqlexecutiondatasourcesFileFormatWriterexecuteTask3.apply(FileFormatWriter.scala:250)
at org.apache.spark.util.Utils.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)atorg.apache.spark.sql.execution.datasources.FileFormatWriter.orgapachesparksqlexecutiondatasourcesFileFormatWriter$$executeTask(FileFormatWriter.scala:256)
... 10 more
Hdfs has provided serveral construct function to build path:
We could fall back to construct a path from a String rather than URI.