Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.2
-
None
-
Patch
Description
Symptom
I was running Hive over AWS S3 Inventory Report, which uses SymlinkTextInputFormat, and symlink file content is the FQDN S3 URL of each s3 file, like :
s3://inventory-bucket/bucket1/2020-02-04-11-23-00-AEFEDDCE s3://inventory-bucket/bucket1/2020-02-05-11-23-00-BCEDCCDD
When I have the following setting:
set hive.rework.mapredwork=true;
The job fails with NullPointException, without stack trace.
Cause
The content of symlink may be arbitrary full qualified FS path, while SymbolicInputFormat uses the default FS instance to get the status of the data files, which fails (and returns null) when the schema of data file differs from Hive's default FS.
// "fileSystem" may not be able to list status for given file uri. FileStatus[] matches = fileSystem.globStatus(new Path(line));
Fix
Please check attached npe-symbolic-inputformat.patch
Attachments
Attachments
Issue Links
- links to