Resolution: Won't Fix
Hadoop 2.8.0 binaries
We are getting the following exception:
Combining following factors will cause it:
- Use S3
- Use format ORC
- Don't apply a partitioning on de data
- Embed AWS credentials in the path
The problem is in the PartitioningAwareFileIndex def allFiles()
leafDirToChildrenFiles uses the path WITHOUT credentials as its key while the qualifiedPath contains the path WITH credentials.
So leafDirToChildrenFiles.get(qualifiedPath) doesn't find any files, so no data is read and the schema cannot be defined.
Spark does output the S3xLoginHelper:90 - The Filesystem URI contains login details. This is insecure and may be unsupported in future., but this should not mean that it shouldn't work anymore.
Move the AWS credentials from the path to the SparkSession