I was running Hive over AWS S3 Inventory Report, which uses SymlinkTextInputFormat, and symlink file content is the FQDN S3 URL of each s3 file, like :
When I have the following setting:
The job fails with NullPointException, without stack trace.
The content of symlink may be arbitrary full qualified FS path, while SymbolicInputFormat uses the default FS instance to get the status of the data files, which fails (and returns null) when the schema of data file differs from Hive's default FS.
Please check attached npe-symbolic-inputformat.patch